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This chapter introduces the basic features of the pixel 
array. I explain how the pixel array is digitized from the 
image plane, how pixel values are related to brightness 
and color, and why most imaging systems use pixel 
values that are nonlinearly related to light intensity. 

Imaging 

In human vision, the three-dimensional world is imaged 
by the lens of the eye onto the retina, which is popu- 
lated with photoreceptor cells that respond to light 
having wavelengths ranging from about 400 nm to 
700 nm. In video and in film, we build a camera having 
a lens and a photosensitive device, to mimic how the 
world is perceived by vision. Although the shape of the 
retina is roughly a section of a sphere, it is topologi- 
cally two dimensional. In a camera, for practical 
reasons, we employ a flat image plane, sketched in 
Figure 1.1 below, instead of a section of a sphere. Image 
science concerns analyzing the continuous distribution 
of optical power that is incident on the image plane. 
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Figure 1 .2 Aspect ratio of video, 
HDTV, and film are compared. 
Aspect ratio is properly written 
width: height (not height: width). 
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Schubin, Mark, “Searching for the 
Perfect Aspect Ratio," in SMPTE 
Journal 105 (8): 460-478 (Aug. 
1996). The 1.85:1 aspect ratio is 
achieved with a spherical lens (as 
opposed to the aspherical lens 
used for anamorphic images). 



The 2.39:1 ratio for cinema film is 
recent; formerly, 2.35:1 was used. 
The term anamorphic in video usually 
refers to a 16:9 widescreen variant 
of a base video standard, where the 
horizontal dimension of the 16:9 
image is transmitted in the same 
time interval as the 4:3 aspect ratio 
standard. See page 99. 



Aspect ratio 

Aspect ratio is simply the ratio of an image's width to its 
height. Standard aspect ratios for film and video are 
sketched, to scale, in Figure 1.2 above. Conventional 
standard-definition television (SDTV) has an aspect ratio 
of 4:3. Widescreen refers to an aspect ratio wider than 
4:3. Widescreen television and high-definition televi- 
sion (HDTV) have an aspect ratio of 16:9. Cinema film 
commonly uses 1.85:1 ("flat," or "spherical"). In Europe 
and Asia, 1.66:1 is usually used. 

To obtain 2.39:1 aspect ratio ("Cinemascope," or collo- 
quially, “scope"), film is typically shot with an aspher- 
ical lens that squeezes the horizontal dimension of the 
image by a factor of two. The projector is equipped 
with a similar lens, to restore the horizontal dimension 
of the projected image. The lens and the technique are 
called anamorphic. In principle, an anamorphic lens can 
have any ratio; in practice, a ratio of two is ubiquitous. 

Film can be transferred to 4:3 video by cropping the 
sides of the frame, at the expense of losing some 
picture content. Pan-and-scan, sketched in Figure 1.3 
opposite, refers to choosing, on a scene-by-scene basis 
during film transfer, the 4:3 region to be maintained. 

Many directors and producers prefer their films not to 
be altered by cropping, so many movies on VHS and 
DVD are released in letterbox format, sketched in 
Figure 1.4 opposite. In letterbox format, the entire film 
image is maintained, and the top and bottom of the 4:3 
frame are unused. (Either gray or black is displayed.) 
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16:9 



4:3 

- 16:9 



Figure 1.3 Pan-and-scan 

crops the width of widescreen 
material - here, 1 6:9 - for 
a 4:3 aspect ratio display. 



1-D sampling 



2-D sampling 




4:3 




Figure 1 .4 Letterbox 
format fits widescreen 
material - here, 1 6:9 - to 
the width of a 4:3 display. 



Figure 1 .5 Pillarbox format 

(sometimes called sidebar) fits 
narrow-aspect-ratio material 
to the height of a 16:9 display. 



With the advent of widescreen consumer television 
receivers, it is becoming common to see 4:3 material 
displayed on widescreen displays in pillarbox format, in 
Figure 1.5. The full height of the display is used, and the 
left and right of the widescreen frame are blanked. 

Digitization 

Signals captured from the physical world are translated 
into digital form by digitization, which involves two 
processes, sketched in Figure 1.6 overleaf. A signal is 
digitized by subjecting it to both sampling (in time or 
space) and quantization (in amplitude). The operations 
may take place in either order, though sampling usually 
precedes quantization. Quantization assigns an integer 
to signal amplitude at an instant of time or a point in 
space, as I will explain in Quantization, on page 17. 

A continuous one-dimensional function of time, such as 
sound pressure of an audio signal, is sampled through 
forming a series of discrete values, each of which is 
a function of the distribution of intensity across a small 
interval of time. Uniform sampling, where the time 
intervals are of equal duration, is nearly always used. 
Details will be presented in Filtering and sampling, on 
page 141 . 

A continuous two-dimensional function of space is 
sampled by assigning, to each element of a sampling 
grid (or lattice), a value that is a function of the distri- 
bution of intensity over a small region of space. In 
digital video and in conventional image processing, the 
samples lie on a regular, rectangular grid. 
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Figure 1.6 Digitization 

comprises sampling and 
quantization, in either order. 
Sampling density, expressed 
in units such as pixels per 
inch (ppi), relates to resolu- 
tion. Quantization relates to 
the number of bits per pixel 
(bpp). Total data rate or data 
capacity depends upon the 
product of these two factors. 



dy 



Sample 

time/space 



d h 



► Digitize 




Quantize 

amplitude 



Samples need not be digital: a charge-coupled device 
(CCD) camera is inherently sampled, but it is not inher- 
ently quantized. Analog video is not sampled horizon- 
tally but is sampled vertically by scanning and sampled 
temporally at the frame rate. 



In video and computing, a pixel 
comprises the set of all components 
necessary to represent color. Excep- 
tionally, in the terminology of digital 
still camera imaging devices, a pixel 
is any component individually. 



Pixel array 

A digital image is represented by a rectangular array 
(matrix) of picture elements (pels, or pixels). In 
a grayscale system, each pixel comprises a single 
component whose value is related to what is loosely 
called brightness. In a color system, each pixel 
comprises several components - usually three - whose 
values are closely related to human color perception. 



In multispectral imaging, each pixel has two or more 
components, representing power from different wave- 
length bands. Such a system may be described as 
having color, but multispectral systems are usually 
designed for purposes of science, not vision: A set of 
pixel component values in a multispectral system 
usually has no close relationship to color perception. 



Each component of a pixel has a value that depends 
upon the brightness and color in a small region 
surrounding the corresponding point in the sampling 
lattice. Each component is usually quantized to an 
integer value occupying between 1 and 16 bits - often 
8 bits - of digital storage. 



6 



DIGITAL VIDEO AND HDTV ALGORITHMS AND INTERFACES 



[0, 0] <1 fo of 1 



IT - 0 






Figure 1 .7 Pixel arrays of 
several imaging standards are 
shown, with their counts of 
image columns and rows. 
480/29.97 SDTV, indicated 
here as 720x480, and SIF, 
have nonsquare sampling. 
Analog SDTV broadcast may 
contain a few more than 
480 picture lines; see Picture 
lines, on page 324. For 
explanations of QCIF and 
SIF, see Glossary of video 
signal terms, on page 609. 
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High-Definition Television (HDTV), 1 Mpx 



Workstation, 1 Mpx 



High-Definition Television (HDTV), 2 Mpx 



\0 
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PC/Mac UXGA, 2 Mpx 



The pixel array is stored in digital 
memory. In video, the memory 
containing a single image is called 
a framestore. In computing, it's 
called a framebuffer. 



I prefer the term density to pitch: 
It isn't clear whether the latter 
refers to the dimension of an 
element, or to the number of 
elements per unit distance. 



ITU-T Group 4 fax is standardized 
with about 195.9 ppi horizontally 
and 204.1 ppi vertically, but that is 
now academic since computer fax 
systems assume square sampling 
with exactly 200 pixels/inch. 



A typical video camera or digital still camera has, in the 
image plane, one or more CCD image sensors, each 
containing hundreds of thousands - or perhaps a small 
number of millions - of photosites in a lattice. The total 
number of pixels in an image is simply the product of 
the number of image columns (technically, samples per 
active line, S AL ) and the number of image rows ( active 
lines, Lf). The total pixel count is often expressed in 
kilopixels (Kpx) or megapixels (Mpx). Pixel arrays of 
several image standards are sketched in Figure 1.7. Scan 
order is conventionally left to right, then top to bottom, 
numbering rows and columns from [0, 0] at the top left. 

A system that has equal horizontal and vertical sample 
density is said to have square sampling. In a system with 
square sampling, the number of samples across the 
picture width is the product of the aspect ratio and the 
number of picture lines. (The term square refers to the 
sample density; square does not mean that image infor- 
mation associated with each pixel is distributed 
uniformly throughout a square region.) 

In computing, it is standard to use square sampling. 
Some imaging and video systems use sampling lattices 
where the horizontal and vertical sample pitch are 
unequal: nonsquare sampling. This situation is some- 
times misleadingly referred to as "rectangular sampling," 
but a square is also a rectangle! 



RASTER IMAGES 



CHAPTER 1 



7 



E 

F P 

T 0 Z 

L P E D 

P E C F D 

E D F C Z P 
FELPOPZD 

DEFPOTEL 



Figure 1 .8 

Snellen chart 




Figure 1 .9 Astronomers' 
rule of thumb 



Visual acuity 

When an optometrist measures your visual acuity, he or 
she may use the Snellen chart, represented in Figure 1.8 
in the margin. The results of this test depend upon 
viewing distance. The test is standardized for a viewing 
distance of 20 feet. At that distance, the strokes of the 
letters in the 20/20 row subtend one sixtieth of 
a degree (Vg 0 °, one minute of arc). This is roughly the 
limit of angular discrimination of normal vision. 

Visual angles can be estimated using the astronomers' 
rule of thumb depicted in Figure 1.9 in the margin: 
When held at arm's length, the joint of the thumb 
subtends about two degrees. The full palm subtends 
about ten degrees, and the nail of the little finger 
subtends about one degree. (The angular subtense of 
the full moon is about half a degree.) 

Viewing distance and angle 

If you display a white flatfield on a CRT with typical 
spot size, scan line structure is likely to be visible if the 
viewer is located closer than the distance where adja- 
cent image rows (scan lines) at the display surface 
subtend an angle of one minute of arc (Vg 0 °) or more. 

To achieve viewing where scan-line pitch subtends Veo, 
viewing distance should be about 3400 times the 
distance d between scan lines - that is, 3400 divided by 
the scan line density (e.g., in pixels per inch, ppi): 

3400 1 

distance ~ 3400 ■ d — ; 3400- — ; — — Eql.1 

P pi sin(^)° 

At that distance, there are about 60 pixels per degree. 
Viewing distance expressed numerically as a multiple of 
picture height should be approximately 3400 divided 
by the number of image rows (t A ): 

distance — x PH Eq 1 .2 

La 

SDTV has about 480 image rows (picture lines). The 
scan-line pitch subtends Vg 0 ° at a distance of about 
seven times picture height (PFH), as sketched in 
Figure 1.10 opposite, giving roughly 600 pixels across 



DIGITAL VIDEO AND HDTV ALGORITHMS AND INTERFACES 




SDTV, 480 picture lines 




Figure 1 .10 Viewing distance where scan 
lines become invisible occurs approximately 
where the scan-line pitch subtends an angle 
of about one minute of arc (Vgo°) at the 
display surface. This is roughly the limit of 
angular discrimination for normal vision. 



SDTV, 480 



picture lines 



11° (x8°) 





Figure 1.11 Picture angle of SDTV, sketched 
at the top, is about 11° horizontally and 8° 
vertically, where scan lines are invisible. In 
1920x1080 FIDTV, horizontal angle can 
increase to about 33° and vertical angle to 
about 1 8° preserving the scan-line subtense. 




Figure 1 .12 Picture height at 

an aspect ratio of 4:3 is 3 /s of 
the diagonal; optimum viewing 
distance for conventional video 
is 4.25 times the diagonal. 
Picture height at 16:9 is about 
half the diagonal; optimum 
viewing distance for 2 Mpx 
FIDTV is 1.5 times the diagonal. 



the picture width. Picture angle is about 11°, as shown 
in Figure 1.1 1 . With your hand held at arm's length, 
your palm ought to just cover the width of the picture. 
This distance is about 4.25 times the display diagonal, 
as sketched in Figure 1.12 in the margin. For FIDTV with 
1080 image rows, the viewing distance that yields the 
1/60° scan-line subtense is about 3.1 PFH (see the bottom 
of Figure 1.10), about 1.5 times the display diagonal. 

For SDTV, the total horizontal picture angle at that 
viewing distance is about 11°. Viewers tend to choose 
a viewing distance that renders scan lines invisible; 
angular subtense of a scan line (or pixel) is thereby 
preserved. Thus, the main effect of higher pixel count is 
to enable viewing at a wide picture angle. For 
1920x1080 FIDTV, horizontal viewing angle is tripled 
to 33°, as sketched in Figure 1.11. The "high definition" 
of FIDTV does not squeeze six times the number of 
pixels into the same visual angle! Instead, the entire 
image can potentially occupy a much larger area of the 
viewer's visual field. 
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Figure 1.13 Spatio- 
temporal domains 



Spatiotemporal domains 

A sequence of still pictures captured and displayed at 
a sufficiently high rate - typically between 24 and 60 
pictures per second - can create the illusion of motion, 
as I will describe on page 51. Sampling in time, in 
combination with 2-D (spatial) sampling, causes digital 
video to be sampled in three axes - horizontal, vertical, 
and temporal - as sketched in Figure 1.13 above. One- 
dimensional sampling theory, to be detailed in Filtering 
and sampling, on page 141, applies along each axis. 

At the left of Figure 1.13 is a sketch of a two-dimen- 
sional spatial domain of a single image. Some image 
processing operations, such as certain kinds of filtering, 
can be performed separately on the horizontal and 
vertical axes, and have an effect in the spatial domain - 
these operations are called separable. Other processing 
operations cannot be separated into horizontal and 
vertical facets, and must be performed directly on 
a two-dimensional sample array. Two-dimensional 
sampling will be detailed in Image digitization and 
reconstruction, on page 187. 
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See Appendix B, Introduction to 
radiometry and photometry, on 
page 601 . 



The term luminance is often care- 
lessly and incorrectly used to refer 
to luma; see below. In image 
reproduction, we are usually 
concerned not with (absolute) 
luminance, but with relative lumi- 
nance, to be detailed on page 206. 



Regrettably, many practitioners of 
computer graphics, and of digital 
image processing, have a cavalier 
attitude toward these terms. In the 
HSB, HSI, HSL, and HSV systems, 

B allegedly stands for brightness, 

/ for intensity, L for lightness, and 
V for value. None of these systems 
computes brightness, intensity, 
luminance, or value according to 
any definition that is recognized in 
color science! 



Lightness terminology 

In a grayscale image, each pixel value represents what is 
loosely called brightness. However, brightness is defined 
formally as the attribute of a visual sensation according 
to which an area appears to emit more or less light. This 
definition is obviously subjective, so brightness is an 
inappropriate metric for image data. 

Intensity is radiant power in a particular direction; 
radiance is intensity per unit projected area. These 
terms disregard wavelength composition. But in color 
imaging, wavelength is important! Neither of these 
quantities is a suitable metric for color image data. 

Luminance is radiance weighted by the spectral sensi- 
tivity associated with the brightness sensation of vision. 
Luminance is proportional to intensity. Imaging systems 
rarely use pixel values proportional to luminance; values 
nonlinearly related to luminance are usually used. 

Illuminance is luminance integrated over a half-sphere. 

Lightness - formally, CIE L* - is the standard approxi- 
mation to the perceptual response to luminance. It is 
computed by subjecting luminance to a nonlinear 
transfer function that mimics vision. A few grayscale 
imaging systems have pixel values proportional to L*. 

Value refers to measures of lightness apart from CIE L*. 
In image science, value is rarely - if ever - used in any 
sense consistent with accurate color. (Several different 
value scales are graphed in Figure 20.2 on page 208.) 

Color images are sensed and reproduced based upon 
tristimulus values, whose amplitudes are proportional to 
intensity but whose spectral compositions are carefully 
chosen according to the principles of color science. As 
their name implies, tristimulus values come in sets of 3. 

The image sensor of a digital camera produces values, 
proportional to radiance, that approximate red, green, 
and blue (RGB) tristimulus values. (I call these values 
linear-light.) However, in most imaging systems, RGB 
tristimulus values are subject to a nonlinear transfer 
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See Appendix A, YUV and luminance 
considered harmful, on page 595. 




Figure 1 .14 Contrast sensi- 
tivity test pattern reveals 
that a just-noticeable differ- 
ence (JND) occurs when the 
step between luminance 
levels is 1% of Y. 



255—, 



201 _ 
200 — 



A = 



0.5% 



> 2.55:1 



101 - A- 
100 =* 



26 

25 

0 — 



A= 4% 



Figure 1.15 The "code 100" 
problem with linear-light 
coding is that at code levels 
below 100, the steps between 
code values have ratios larger 
than the visual threshold: The 
steps are liable to be visible. 



function - gamma correction - that mimics the percep- 
tual response. Most imaging systems use RGB values 
that are not proportional to intensity. The notation 
R'G'B' denotes the nonlinearity. 

Luma (/') is formed as a suitably weighted sum of 
R'G'B'-, it is the basis of luma/color difference coding. 
Luma is comparable to lightness; it is often carelessly 
and incorrectly called luminance by video engineers. 

Nonlinear image coding 

Vision cannot distinguish two luminance levels if the 
ratio between them is less than about 1.01 - in other 
words, the visual threshold for luminance difference is 
about 1%. This contrast sensitivity threshold is estab- 
lished by experiments using the test pattern such as the 
one sketched in Figure 1.14 in the margin; details will 
be presented in Contrast sensitivity, on page 198. 

Consider pixel values proportional to luminance, where 
code zero represents black, and the maximum code 
value of 255 represents white, as in Figure 1.15. 

Code 100 lies at the point on the scale where the ratio 
between adjacent luminance values is 1%: The 
boundary between a region of code 100 samples and 
a region of code 101 samples is likely to be visible. 

As the pixel value decreases below 100, the difference 
in luminance between adjacent codes becomes increas- 
ingly perceptible: At code 25, the ratio between adja- 
cent luminance values is 4%. In a large area of smoothly 
varying shades of gray, these luminance differences are 
likely to be visible or even objectionable. Visible jumps 
in luminance produce artifacts known as contouring or 
banding. 

Linear-light codes above 100 suffer no banding arti- 
facts. However, as code value increases toward white, 
the codes have decreasing perceptual utility: At code 
200, the luminance ratio between adjacent codes is just 
0.5%, near the threshold of visibility. Codes 200 and 
201 are visually indistinguishable; code 201 could be 
discarded without its absence being noticed. 
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Figure 1.16 The "code 100" 
problem is mitigated by using 
more than 8 bits to represent 
luminance. Here, 12 bits are 
used, placing the top end of the 
scale at 4095. However, the 
majority of these 4096 codes 
cannot be distinguished visually. 
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Conversely, monitor R'C'B' values 
are proportional to reproduced 
luminance raised to approximately 
the 0.4-power. 



The cathode ray tube (CRT) is the 
dominant display device for tele- 
vision receivers and for desktop 
computers. 



High-quality image reproduction requires a ratio of at 
least 30 to 1 between the luminance of white and the 
luminance of black, as I will explain in Contrast ratio, on 
page 197. In 8-bit linear-light coding, the ratio between 
the brightest luminance (code 255) and the darkest 
luminance that can be reproduced without banding 
(code 100) is only 2.55:1. Linear-light coding in 8 bits is 
unsuitable for high-quality images. 

This "code 100" problem can be mitigated by placing 
the top end of the scale at a code value higher than 
100, as sketched in Figure 1.16 in the margin. If lumi- 
nance is represented in 12 bits, white is at code 4095; 
the luminance ratio between code 100 and white 
reaches 40.95:1. However, the vast majority of those 
4096 code values cannot be distinguished visually; for 
example, codes 4001 through 4040 are visually indis- 
tinguishable. Rather than coding luminance linearly 
with a large number of bits, we can use many fewer 
code values assigned nonlinearly on a perceptual scale. 

If the threshold of vision behaved strictly according to 
the 1% relationship across the whole tone scale, then 
luminance could be coded logarithmically. For a con- 
trast ratio of 100:1, about 463 code values would be 
required, corresponding to about 9 bits. In video, 
for reasons to be explained in Luminance and lightness, 
on page 203, instead of modeling the lightness sensi- 
tivity of vision as a logarithmic function, we model it as 
a power function with an exponent of about 0.4. 

The luminance of the red, green, or blue primary light 
produced by a monitor is proportional to voltage (or 
code value) raised to approximately the 2.5-power. This 
will be detailed in Chapter 23, Gamma, on page 257 . 

Amazingly, a CRT's transfer function is nearly the 
inverse of vision's lightness sensitivity! The nonlinear 
lightness response of vision and the power function 
intrinsic to a CRT combine to cause monitor voltage, or 
code value, to exhibit perceptual uniformity, as demon- 
strated in Figures 1.17 and 1.18 overleaf. 
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Pixel value, 8-bit scale 0 50 100 150 200 250 



Figure 1.17 Grayscale ramp on a CRT display is generated by writing successive integer values 0 
through 255 into the columns of a framebuffer. When processed by a digital-to-analog converter 
(DAC), and presented to a CRT display, a perceptually uniform sweep of lightness results. A naive 
experimenter might conclude - mistakenly! - that code values are proportional to intensity. 



Pixel value, 8-bit scale 
Luminance, relative 
CIE Lightness, L* 




0 50 100 150 200 250 



0 0.02 0.05 0.1 0.2 0.4 0.6 0.8 1 
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Figure 1.18 Grayscale ramp augmented with CIE lightness (/.*, on the middle scale), and CIE 
relative luminance (V, proportional to intensity, on the bottom scale). The point midway across 
the screen has lightness value midway between black and white. There is a near-linear relation- 
ship between code value and lightness. Flowever, luminance at the midway point is only about 
18% of white! Luminance produced by a CRT is approximately proportional to the 2.5-power 
of code value. Lightness is roughly proportional to the 0.4-power of luminance. Amazingly, these 
relationships are near inverses. Their near-perfect cancellation has led many workers in video, 
computer graphics, and digital image processing to misinterpret the term intensity, and to 
underestimate the importance of nonlinear transfer functions. 
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See Bit depth requirements, 
on page 269. 



In video, this perceptually uniform relationship is 
exploited by gamma correction circuitry incorporated 
into every video camera. The R'G'B' values that result 
from gamma correction - the values that are processed, 
recorded, and transmitted in video - are roughly 
proportional to the square root of scene intensity: R'G’B' 
values are nearly perceptually uniform. Perceptual 
uniformity allows as few as 8 bits to be used for each 
R'G’B' component. Without perceptual uniformity, each 
component would need 1 1 bits or more. Digital still 
cameras adopt a similar approach. 

Linear and nonlinear 

Image sensors generally convert photons to electrons: 
They produce signals whose amplitude is proportional 
to physical intensity. Video signals are usually processed 
through analog circuits that have linear response to 
voltage, or digital systems that are linear with respect to 
the arithmetic performed on the codewords. Video 
systems are often said to be linear. 

However, linearity in one domain cannot be carried 
across to another domain if a nonlinear function sepa- 
rates the two. In video, scene luminance is in a linear 
optical domain, and the video signal is in a linear elec- 
trical domain. However, the nonlinear gamma correc- 
tion imposed between the domains means that 
luminance and signal amplitude are not linearly related. 
When you ask a video engineer if his system is linear, he 
will say, "Of course!" - referring to linear voltage. When 
you ask an optical engineer if her system is linear, she 
will say, "Of course!" - referring to intensity, radiance, 
or luminance. However, if a nonlinear transform lies 
between the two systems, a linear operation performed 
in one domain is not linear in the other. 

If your computation involves perception, nonlinear 
representation may be required. If you perform a dis- 
crete cosine transform (DCT) on image data as part of 
image compression, as in JPEG, you should use 
nonlinear coding that exhibits perceptual uniformity, 
because you wish to minimize the perceptibility of the 
errors that will be introduced by the coding process. 
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Luma and color difference components 

Some digital video equipment uses R'G'B' components 
directly. However, human vision has considerably less 
ability to sense detail in color information than in light- 
ness. Provided lightness detail is maintained, color 
detail can be reduced by subsampling, which is a form 
of filtering (or averaging). 

A color scientist might implement subsampling by 
forming relative luminance as a weighted sum of linear 
RGB tristimulus values, then imposing a nonlinear 
transfer function approximating CIE lightness (/.*). In 
video, we depart from the theory of color science, and 
implement an engineering approximation to be intro- 
duced in Constant luminance, on page 75. Component 
video systems convey image data as a luma compo- 
nent, V" , approximating lightness, and two color differ- 
ence components - C B and C R in the digital domain, or 
P B and P R in analog - that represent color disregarding 
lightness. The color difference components are subsam- 
pled to reduce their data rate. I will explain V"C B C R and 
Y'P b P r components in Introduction to luma and chroma, 
on page 87. 

SDTV/HDTV 

Until recently, it was safe to use the term television, 
but the emergence of widescreen television, high- 
definition television, and other new systems introduces 
ambiguity into that unqualified word. Surprisingly, there 
is no broad agreement on definitions of standard-defini- 
tion television (SDTV) and high-definition television 
(HDTV). I classify as SDTV any video system whose 
image totals fewer than 3 /4 million pixels. I classify as 
HDTV any video system with a native aspect ratio of 
16:9 whose image totals 3 4 million pixels or more. 
Digital television (DTV) encompasses digital SDTV and 
digital HDTV. Some people and organizations consider 
SDTV to imply component digital operation - that is, 
NTSC, PAL, and component analog systems are 
excluded. 
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Quantization 



2 



Resolution properly refers to 
spatial phenomena; see page 65. 

It is a mistake to refer to a sample 
as having 8-bit resolution: Say 
quantization or precision instead. 



A signal whose amplitude takes a range of continuous 
values is quantized by assigning to each of several (or 
several hundred or several thousand) intervals of ampli- 
tude a discrete, numbered level. In uniform quantiza- 
tion, the steps between levels have equal amplitude. 
Quantization discards signal information lying between 
quantizer levels. Quantizer performance is character- 
ized by the extent of this loss. Figure 2.1 below shows, 
at the left, the transfer function of a uniform quantizer. 



To make a 100-foot-long fence with 
fence posts every 10 feet, you need 
1 1 posts, not ten! Take care to 
distinguish levels (in the left-hand 
portion of Figure 2.1 , eleven) from 
steps or risers (here, ten). 



A truecolor image in computing is usually represented 
in R'G'B' components of 8 bits each, as I will explain on 
page 36. Each component ranges from 0 through 255, 
as sketched at the right of Figure 2.1 : Black is at zero, 
and white is at 255. Grayscale and truecolor data in 
computing is usually coded so as to exhibit approxi- 
mate perceptual uniformity, as I described on page 13: 
The steps are not proportional to intensity, but are 
instead uniformly spaced perceptually. The number of 
steps required depends upon properties of perception. 



Figure 2.1 Quantizer 
transfer function is 

shown at the left. The 
usual 0 to 255 range of 
quantized R'G'B' compo- 
nents in computing is 
sketched at the right. 
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Decibels 



Eq 2.1 Power ratio, in decibels: 
P, 

m = 10lg — (d B) 

5 P 2 



Eq 2.2 Power ratio, with respect 
to a reference power: 

m = 10 lg — (dB) 

P 

r REF 



In following sections, I will describe signal amplitude, 
noise amplitude, and the ratio between these - the 
signal to noise ratio (SNR). In engineering, ratios such as 
SNR are usually expressed in logarithmic units. A power 
ratio of 10:1 is defined as a bel (B), in honor of Alex- 
ander Graham Bell. A more practical measure is one- 
tenth of a bel - a decibel (dB). This is a power ratio of 
1 0 0 1 , or about 1.259. The ratio of a power to 
a power P 2 , expressed in decibels, is given by 
Equation 2.1, where the symbol lg represents base-10 
logarithm. Often, signal power is given with respect to 
a reference power P RE f, which must either be specified 
(often as a letter following dB), or be implied by the 
context. Reference values of 1 W (dBW) and 1 mW 
(dBm) are common. This situation is expressed in 
Equation 2.2. A doubling of power represents an 
increase of about 3.01 dB (usually written 3 dB). If 
power is multiplied by ten, the change is +10 dB; if 
reduced to a tenth, the change is -10 dB. 



Eq 2.3 Power ratio, in decibels, 
as a function of voltage: 

m = 20lg— (dB) 
s V 2 



Consider a cable conveying a 100 MHz radio frequency 
signal. After 100 m of cable, power has diminished to 
some fraction, perhaps Vs, of its original value. After 
another 100 m, power will be reduced by the same 
fraction again. Rather than expressing this cable attenu- 
ation as a unitless fraction 0.125 per 100 m, we express 
it as 9 dB per 100 m; power at the end of 1 km of cable 
is -90 dB referenced to the source power. 



Voltage ratio 


Decibels 


10 


20 dB 


2 


6 dB 


1.112 


1 dB 


1.0116 


0.1 dB 


1 


0 dB 


0.5 


-6 dB 


0.1 


-20 dB 


0.01 


-40 dB 


0.001 


-60 dB 



Table 2.1 Decibel examples 



The decibel is defined as a power ratio. If a voltage 
source is applied to a constant impedance, and the 
voltage is doubled, current doubles as well, so power 
increases by a factor of four. More generally, if voltage 
(or current) into a constant impedance changes by 
a ratio r, power changes by the ratio r 2 . (The log of r 2 is 
2 log r.) To compute decibels from a voltage ratio, use 
Equation 2.3. In digital signal processing (DSP), digital 
code levels are treated equivalently to voltage; the 
decibel in DSP is based upon voltage ratios. 

Table 2.1 in the margin gives numerical examples of 
decibels used for voltage ratios. 
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The oct in octave refers to the 
eight whole tones in music, do, re, 
me, fa, sol, la, ti, do, that cover 
a 2:1 range of frequency. 

A stop in photography is a 2:1 
ratio of illuminance. 
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Figure 2.2 Peak-to-peak, 
peak, and RMS values are 

measured as the total excur- 
sion, half the total excursion, 
and the square root of the 
average of squared values, 
respectively. Here, a noise 
component is shown. 
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A 2:1 ratio of frequencies is an octave. When voltage 
halves with each doubling in frequency, an electronics 
engineer refers to this as a loss of 6 dB per octave. If 
voltage halves with each doubling, then it is reduced to 
one-tenth at ten times the frequency; a 10:1 ratio of 
quantities is a decade, so 6 dB/octave is equivalent to 
20 dB/decade. (The base-2 log of 10 is very nearly 2 %.) 

Noise, signal, sensitivity 

Analog electronic systems are inevitably subject to noise 
introduced from thermal and other sources. Thermal 
noise is unrelated to the signal being processed. 

A system may also be subject to external sources of 
interference. As signal amplitude decreases, noise and 
interference make a larger relative contribution. 

Processing, recording, and transmission may introduce 
noise that is uncorrelated to the signal. In addition, 
distortion that is correlated to the signal may be intro- 
duced. As it pertains to objective measurement of the 
performance of a system, distortion is treated like noise; 
however, a given amount of distortion may be more or 
less perceptible than the same amount of noise. Distor- 
tion that can be attributed to a particular process is 
known as an artifact, particularly if it has a distinctive 
perceptual effect. 

In video, signal-to-noise ratio (SNR) is the ratio of the 
peak-to-peak amplitude of a specified signal, often the 
reference amplitude or the largest amplitude that can 
be carried by a system, to the root mean square (RMS) 
magnitude of undesired components including noise 
and distortion. (It is sometimes called PSNR, to empha- 
size peak signal; see Figure 2.2 in the margin.) SNR is 
expressed in units of decibels. In many fields, such as 
audio, SNR is specified or measured in a physical (inten- 
sity) domain. In video, SNR usually applies to gamma- 
corrected components R', C, B', or Y' that are in the 
perceptual domain; so, SNR correlates with perceptual 
performance. 

Sensitivity refers to the minimum source power that 
achieves acceptable (or specified) SNR performance. 
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Eq 2.4 Theoretical SNR limit 
for a /<-step quantizer: 

20 Ig -/l2 j 

The factor of root-12, about 
11 d B, accounts for the ratio 
between peak-to-peak and 
RMS; for details, see Schreiber 
(cited below). 



Some people use the word dither 
to refer to this technique; other 
people use the term for schemes 
that involve spatial distribution of 
the noise. The technique was first 
described by Roberts, L.G., 
“Picture coding using pseudo- 
random noise," in IRE Trans. 

IT-8 (2): 145-154 (1962). 

It is nicely summarized in 
Schreiber, William F., Fundamen- 
tals of Electronic Imaging Systems, 
Third Edition (Berlin: Springer- 
Verlag, 1993). 



Quantization error 

A quantized signal takes only discrete, predetermined 
levels: Compared to the original continuous signal, 
quantization error has been introduced. This error is 
correlated with the signal, and is properly called 
distortion. However, classical signal theory deals with 
the addition of noise to signals. Providing each quan- 
tizer step is small compared to signal amplitude, we can 
consider the loss of signal in a quantizer as addition of 
an equivalent amount of noise instead: Quantization 
diminishes signal-to-noise ratio. The theoretical SNR 
limit of a k - step quantizer is given by Equation 2.4. 
Eight-bit quantization, common in video, has 
a theoretical SNR limit (peak-to-peak signal to RMS 
noise) of about 56 dB. 

If an analog signal has very little noise, then its quan- 
tized value can be nearly exact when near a step, but 
can exhibit an error of nearly ± 1/2 a step when the 
analog signal is midway between quantized levels. In 
video, this situation can cause the reproduced image to 
exhibit noise modulation. It is beneficial to introduce, 
prior to quantization, roughly ± 1/2 of a quantizer step's 
worth of high-frequency random or pseudorandom 
noise to avoid this effect. This introduces a little noise 
into the picture, but this noise is less visible than low- 
frequency "patterning" of the quantization that would 
be liable to result without it. SNR is slightly degraded, 
but subjective picture quality is improved. Historically, 
video digitizers implicitly assumed that the input signal 
itself arrived with sufficient analog noise to perform this 
function; nowadays, analog noise levels are lower, and 
the noise should be added explicitly at the digitizer. 

The degree to which noise in a video signal is visible - 
or objectionable - depends upon the properties of 
vision. To minimize noise visibility, we digitize a signal 
that is a carefully chosen nonlinear function of lumi- 
nance (or tristimulus values). The function is chosen so 
that a given amount of noise is approximately equally 
perceptible across the whole tone scale from black to 
white. This concept was outlined in Nonlinear image 
coding, on page 12; in the sections to follow, linearity 
and perceptual uniformity are elaborated. 
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Sound pressure level, relative 




Figure 2.3 Audio taper 



Linearity 

Electronic systems are often expected to satisfy the 
principle of superposition; in other words, they are 
expected to exhibit linearity. A system g is linear if and 
only If ( iff) it satisfies both of these conditions: 

g(a • x) = a ■ g(x) [for scalar a] Eq 2.5 

g(x + y) = g(x) + g(y) 

The function g can encompass an entire system: 

A system is linear iff the sum of the individual responses 
of the system to any two signals is identical to its 
response to the sum of the two. Linearity can pertain to 
steady-state response, or to the system's temporal 
response to a changing signal. 

Linearity is a very important property in mathematics, 
signal processing, and video. Many electronic systems 
operate in the linear intensity domain, and use signals 
that directly represent physical quantities. One example 
is compact audio disc (CD) coding: Sound pressure level 
(SPL), proportional to physical intensity, is quantized 
linearly into 16-bit samples. 

Human perception, though, is nonlinear. Image signals 
that are captured, recorded, processed, or transmitted 
are often coded in a nonlinear, perceptually uniform 
manner that optimizes perceptual performance. 

Perceptual uniformity 

A coding system is perceptually uniform if a small 
perturbation to the coded value is approximately 
equally perceptible across the range of that value. If the 
volume control on your radio were physically linear, the 
logarithmic nature of loudness perception would place 
all of the perceptual "action" of the control at the 
bottom of its range. Instead, the control is designed to 
be perceptually uniform. Figure 2.3, in the margin, 
shows the transfer function of a potentiometer with 
standard audio taper: Rotating the knob 10 degrees 
produces a similar perceptual increment in volume 
throughout the range of the control. This is one of 
many examples of perceptual considerations embedded 
into the engineering of an electronic system. 
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Bellamy, John C., Digital 
Telephony, Second Edition 
(New York: Wiley, 1991), 
98-111 and 472-476. 



For engineering purposes, we 
consider R', O ' , and B' to be 
encoded with identical transfer 
functions. In practice, encoding 
gain differs owing to white 
balance. Also, the encoding 
transfer functions may be 
adjusted differently for artistic 
purposes during image capture 
or postproduction. 



Excursion in analog 480/ 
systems is often expressed in IRE 
units, which I will introduce on 
page 327. 



As I have mentioned, CD audio is coded linearly, with 
16 bits per sample. Audio for digital telephony usually 
has just 8 bits per sample; this necessitates nonlinear 
coding. Two coding laws are in use, A-law and p-law; 
both of these involve decoder transfer functions that 
are comparable to bipolar versions of Figure 2.3. 

In video (including motion-JPEG and MPEG), and in 
digital photography (including JPEG/JFIF), R'G'B' 
components are coded in a perceptually uniform 
manner. Noise visibility is minimized by applying 
a nonlinear transfer function - gamma correction - to 
each tristimulus value sensed from the scene. The 
transfer function standardized for studio video is 
detailed in Rec. 709 transfer function, on page 263. In 
digital still cameras, a transfer function resembling that 
of sRGB is used; it is detailed in sRGB transfer function, 
on page 267. Identical nonlinear transfer functions are 
applied to the red, green, and blue components; in 
video, the nonlinearity is subsequently incorporated 
into the luma and chroma (V"C B C R ) components. The 
approximate inverse transfer function is imposed at the 
display device: A CRT has a nonlinear transfer function 
from voltage (or code value) to luminance; that func- 
tion is comparable to Figure 2.3 on page 21 . Nonlinear 
coding is the central topic of Chapter 23, Gamma, on 
page 257 . 

Headroom and footroom 

Excursion (or colloquially, swing) refers to the range of 
a signal - the difference between its maximum and 
minimum levels. In video, reference excursion is the 
range between standardized reference white and refer- 
ence black levels. 



In high-quality video, it is necessary to preserve tran- 
sient signal undershoots below black, and overshoots 
above white, that are liable to result from processing by 
digital and analog filters. Studio video standards provide 
footroom below reference black, and headroom above 
reference white. Headroom allows code values that 
exceed reference white; therefore, you should distin- 
guish between reference white and peak white. 
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Figure 2.4 Footroom and head- 
room are provided in digital 
video standards to accommo- 
date filter undershoot and 
overshoot. For processing, 
black is assigned to code 0; in 
an 8-bit system, R', C, B', or 
luma ( Y ') range 0 through 219. 
At an 8-bit interface according 
to Rec. 601, an offset of +16 is 
added (indicated in italics). 
Interface codes 0 and 255 are 
reserved for synchronization; 
those codes are prohibited in 
video data. 



254 +238 



235 +219 -| 




76 0 



FOOT- 
_■ ROOM 

7 - -15 



HEAD- 

ROOM 



I represent video signals on an abstract scale where 
reference black has zero value independent of coding 
range. I assign white to an appropriate value, often 1, 
but sometimes other values such as 160, 219, 255, 

640, or 876. A sample is ordinarily represented in hard- 
ware as a fixed-point integer with a limited number of 
bits (often 8 or 10). In computing, R'G’B' components 
of 8 bits each typically range from 0 through 255; the 
right-hand sketch of Figure 2.1 on page 17 shows 
a suitable quantizer. 

Eight-bit studio standards have 219 steps between 
reference black and reference white. Footroom of 15 
codes, and headroom of 19 codes, is available. For no 
good reason, studio standards specify asymmetrical 
footroom and headroom. Figure 2.4 above shows the 
standard coding range for R', C, or B', or luma. 

At the hardware level, an 8-bit interface is considered 
to convey values 0 through 255. At an 8-bit digital 
video interface, an offset of +16 is added to the code 
values shown in Figure 2.4: Reference black is placed at 
code 16, and white at 235. I consider the offset to be 
added or removed at the interface, because a signed 
representation is necessary for many processing opera- 
tions (such as changing gain). Flowever, hardware 
designers often consider digital video to have black at 
code 16 and white at 235; this makes interface design 
easy, but makes signal arithmetic design more difficult. 



CHAPTER 2 



QUANTIZATION 



23 



Figure 2.5 Mid-tread quan- 
tizer for C B and C R bipolar 
signals allows zero chroma to 
be represented exactly. ( Mid- 
riser quantizers are rarely used 
in video.) For processing, C B 
and C R abstract values range 
±112. At an 8-bit studio video 
interface according to Rec. 601, 
an offset of +128 is added, 
indicated by the values in 
italics. Interface codes 0 and 
255 are reserved for synchroni- 
zation, as they are for luma. 



254 +126 
235 +112 



128 0 — 

16 -112 ■" 
7 -127 



^MID- 

TREAD 



Figure 2.4 showed a quantizer for a unipolar signal such 
as luma. C B and C R are bipolar signals, ranging positive 
and negative. For C B and C R it is standard to use a mid- 
tread quantizer, such as the one in Figure 2.5 above, so 
that zero chroma has an exact reprtesentation. For 
processing, a signed representation is necessary; at 
a studio video interface, it is standard to scale 8-bit 
color difference components to an excursion of 224, 
and add an offset of +128. Unfortunately, the reference 
excursion of 224 for C B or C R is different from the refer- 
ence excursion of 219 for Y'. 

R'G'B' or V"C B C R components of 8 bits each suffice for 
broadcast quality distribution. Flowever, if a video 
signal must be processed many times, say for inclusion 
in a multiple-layer composited image, then roundoff 
errors are liable to accumulate. To avoid roundoff error, 
recording equipment, and interfaces between equip- 
ment, should carry 10 bits each of V"C B C R . Ten-bit 
studio interfaces have the reference levels of Figures 2.4 
and 2.5 multiplied by 4; the extra two bits are 
appended as least-significant bits to provide increased 
precision. Intermediate results within equipment may 
need to be maintained to 12, 14, or even 16 bits. 
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Brightness and contrast 
controls 



3 



This chapter introduces the brightness and contrast 
controls of video. Beware: Their names are sources of 
confusion ! These operations are normally effected in 
the nonlinear domain - that is, on gamma-corrected 
signals. These operations are normally applied to each 
of the red, green, and blue components simultaneously. 

The contrast control applies a scale factor - in elec- 
trical terms, a gain adjustment - to R'G'B' components. 
(On processing equipment, it is called video level; on 
some television receivers, it is called picture.) Figure 3.1 
below sketches the effect of the contrast control, 
relating video signal input to light output at the display. 
The contrast control affects the luminance that is 
reproduced for the reference white input signal; it 
affects lower signal levels proportionally, ideally having 
no effect on zero signal (reference black). Here I show 
contrast altering they-axis (luminance) scaling; 
however, owing to the properties of the display's 
2.5-power function, suitable scaling of the x-axis - 
the video signal - would have an equivalent effect. 



Figure 3.1 Contrast control 
determines the luminance 
(proportional to intensity) 
produced for white, with inter- 
mediate values toward black 
being scaled appropriately. In a 
well-designed monitor, adjusting 
contrast maintains the correct 
black setting - ideally, zero input 
signal produces zero luminance 
at any contrast setting. 
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Figure 3.2 Brightness control has the 
effect of sliding the black-to-white 
video signal scale left and right along 
the 2.5-power function of the display. 

Here, brightness is set too high; c 

a significant amount of luminance is s 

produced at zero video signal level. I 

No video signal can cause true black 
to be displayed, and the picture 
content rides on an overall pedestal 
of gray. Contrast ratio is degraded. 




& 



Figure 3.3 Brightness control is set 
correctly when the reference black 
video signal level is placed precisely at 
the point of minimum perceptible 
light output at the display. In a 
perfectly dark viewing environment, 
the black signal would produce zero 
luminance; in practice, however, the 
setting is dependent upon the 
amount of ambient light in the 
viewing environment. 




Figure 3.4 Brightness control set 
too low causes a range of input 
signal levels near black to be repro- 
duced "crushed" or "swallowed," 
reproduced indistinguishably from 
black. A cinematographer might 
describe this situation as "lack of 
details in the shadows," however, all 
information in the shadows is lost, 
not just the details. 




When brightness is set as high as 
indicated in Figure 3.2, the effec- 
tive power law exponent is lowered 
from 2.5 to about 2.3; when set as 
low as in Figure 3.4, it is raised to 
about 2.7. For the implications of 
this fact, see page 84. 



The brightness control - more sensibly called black 
level - effectively slides the black-to-white range of the 
video signal along the power function of the display. It 
is implemented by introducing an offset - in electrical 
terms, a bias - into the video signal. Figure 3.3 (middle) 
sketches the situation when the brightness control is 
properly adjusted: Reference black signal level produces 
zero luminance. Misadjustment of brightness is 
a common cause of poor displayed-image quality. If 
brightness is set too high, as depicted in Figure 3.2 
(top), contrast ratio suffers. If brightness is set too low, 
as depicted in Figure 3.4 (bottom), picture information 
near black is lost. 
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SMPTE RP 71, Setting Chroma- 
ticity and Luminance of White for 
Color Television Monitors Using 
Shadow-Mask Picture Tubes. 



LCD: liquid crystal display 



To set brightness (or black level), first display a picture 
that is predominantly or entirely black. Set the control 
to its minimum, then increase its level until the display 
just begins to show a hint of dark gray. The setting is 
somewhat dependent upon ambient light. Modern 
display equipment is sufficiently stable that frequent 
adjustment is unnecessary. 

Once brightness is set correctly, contrast can be set to 
whatever level is appropriate for comfortable viewing, 
provided that clipping and blooming are avoided. In the 
studio, the contrast control can be used to achieve the 
standard luminance of white, typically 103 cd-m -2 . 

In addition to having user controls that affect R'G'B' 
components equally, computer monitors, video moni- 
tors, and television receivers have separate red, green, 
and blue internal adjustments of gain (called drive) and 
offset (called screen, or sometimes cutoff). In a 
display, brightness (or black level) is normally used to 
compensate for the display, not the input signal, and 
thus should be implemented following gain control. 

In processing equipment, it is sometimes necessary to 
correct errors in black level in an input signal while 
maintaining unity gain: The black level control should 
be implemented prior to the application of gain, and 
should not be called brightness. Figures 3.5 and 3.6 
overleaf plot the transfer functions of contrast and 
brightness controls in the video signal path, disre- 
garding the typical 2.5-power function of the display. 

LCD displays have controls labeled brightness and 
contrast, but these controls have different functions 
than the like-named controls of a CRT display. In an 
LCD, the brightness control, or the control with that 
icon, typically alters the backlight luminance. 

Brightness and contrast controls in desktop graphics 

Adobe's Photoshop software established the de facto 
effect of brightness and contrast controls in desktop 
graphics. Photoshop's brightness control is similar to 
the brightness control of video; however, Photoshop's 
contrast differs dramatically from that of video. 
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Figure 3.5 Brightness 
( or black level) control in 
video applies an offset, 
roughly ±20% of full scale, 
to R'C'B' components. 
Though this function is 
evidently a straight line, the 
input and output video 
signals are normally in the 
gamma-corrected 
(perceptual) domain; the 
values are not propor- 
tional to intensity. At the 
minimum and maximum 
settings, I show clipping to 
the Rec. 601 footroom of 
- 15 /219 and headroom of 
238 / 2 l 9 - (Light power cannot 
go negative, but electrical 
and digital signals can.) 




Figure 3.6 Contrast 
( or video level) control 
in video applies a gain 
factor between roughly 
0.5 and 2.0 to R'C'B' 
components. The output 
signal clips if the result 
would fall outside the 
range allowed for the 
coding in use. Here 
I show clipping to the 
Rec. 601 headroom limit. 
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Figure 3.7 Brightness 
control in Photoshop 

applies an offset of -100 to 
+100 to R'G'B' compo- 
nents ranging from 0 to 
255. If a result falls outside 
the range 0 to 255, it satu- 
rates; headroom and foot- 
room are absent. The 
function is evidently 
linear, but depending 
upon the image coding 
standard in use, the input 
and output values are 
generally nonlinearly 
related to luminance (or 
tristimulus values). 




Figure 3.8 Contrast 
control in Photoshop 

subtracts 127.5 from the 
input, applies a gain 
factor between zero (for 
contrast setting of 
-100) and infinity (for 
contrast setting of 
+100), then adds 127.5, 
saturating if the result 
falls outside the range 0 
to 255. This operation is 
very different from the 
action of the contrast 
control in video. 




© 
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The transfer functions of Photoshop's controls are 
sketched in Figures 3.7 and 3.8. R', C, and B' compo- 
nent values in Photoshop are presented to the user as 
values between 0 and 255. Brightness and contrast 
controls have sliders ranging ±100. 

Brightness effects an offset between -100 and +100 
on the R', C, and B' components. Any result outside 
the range 0 to 255 clips to the nearest extreme value, 

0 or 255. Photoshop's brightness control is compa- 
rable to that of video, but its range (roughly ±40% of 
full scale) is greater than the typical video range (of 
about ±20%). 

Photoshop's contrast control follows the application 
of brightness; it applies a gain factor. Instead of leaving 
reference black (code zero) fixed, as a video contrast 
control does, Photoshop "pivots" the gain adjustment 
around the midscale code. The transfer function for 
various settings of the control is graphed in Figure 3.8. 



Figure 3.9 Photoshop contrast 
control's gain factor depends 
upon contrast setting 
according to this function. 



Eq 3.1 



1 + — , -100 < c < 0 

100 



/< 



1 




100 



0 < c < 100 



The power function that relates 
Macintosh QuickDraw R'G'B' 
components to intensity is 
explained on page 273. 



0.58 



1.45 

2.5 



The gain available from Photoshop's contrast control 
ranges from zero to infinity, far wider than video's 
typical range of 0.5 to 2. The function that relates 
Photoshop's contrast to gain is graphed in Figure 3.9 
in the margin. From the -100 setting to the 0 setting, 
gain ranges linearly from zero through unity. From the 0 
setting to the +100 setting, gain ranges nonlinearly 
from unity to infinity, following a reciprocal curve; the 
curve is described by Equation 3.1. 

In desktop graphics applications such as Photoshop, 
image data is usually coded in a perceptually uniform 
manner, comparable to video R'G'B'. On a PC, R'G'B’ 
components are by default proportional to the 
0.4-power of reproduced luminance (or tristimulus) 
values. On Macintosh computers, QuickDraw R'G'B' 
components are by default proportional to the 
0.58-power of displayed luminance (or tristimulus). 
Plowever, on both PC and Macintosh computers, the 
user, system software, or application software can set 
the transfer function to nonstandard functions - 
perhaps even linear-light coding - as I will describe in 
Gamma, on page 257. 
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computing 



4 



This chapter places video into the context of 
computing. Images in computing are represented in 
three forms, depicted schematically in the three rows of 
Figure 4.1 overleaf: symbolic image description, raster 
image, and compressed image. 

• A symbolic image description does not directly 
contain an image, but contains a high-level 2-D or 3-D 
geometric description of an image, such as its objects 
and their properties. A two-dimensional image in this 
form is sometimes called a vector graphic, though its 
primitive objects are usually much more complex than 
the straight-line segments suggested by the word vector. 

• A raster image enumerates the grayscale or color 
content of each pixel directly, in scan-line order. There 
are four fundamental types of raster image: bilevel, 
pseudocolor, grayscale, and truecolor. A fifth type, 
hicolor, is best considered as a variant of truecolor. In 
Figure 4.1, the five types are arranged in columns, from 
low quality at the left to high quality at the right. 

• A compressed image originates with raster image data, 
but the data has been processed to reduce storage 
and/or transmission requirements. The bottom row of 
Figure 4.1 indicates several compression methods. At 
the left are lossless (data) compression methods, gener- 
ally applicable to bilevel and pseudocolor image data; 
at the right are lossy (image) compression methods, 
generally applicable to grayscale and truecolor. 
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Symbolic 

Image 

Description 



Raster 

Image 

Data 



Compressed 
Image Data 
(Examples) 




Figure 4.1 Raster image data may be captured directly, or may be rendered from symbolic image 
data. Traversal from left to right corresponds to conversions that can be accomplished without loss 
Some raster image formats are associated with a lookup table (LUT) or color lookup table (CLUT). 



The grayscale, pseudocolor, and truecolor systems used 
in computing involve lookup tables (LUTs) that map 
pixel values into monitor R'G'B’ values. Most 
computing systems use perceptually uniform image 
coding; however, some systems use linear-light coding, 
and some systems use other techniques. For a system to 
operate in a perceptually uniform manner, similar to or 
compatible with video, its LUTs need to be loaded with 
suitable transfer functions. If the LUTs are loaded with 
transfer functions that cause code values to be propor- 
tional to intensity, then the advantages of perceptual 
uniformity will be diminished or lost. 



Murray, James D., and William 
vanRyper, Encyclopedia of Graphics 
File Formats, Second Edition 
(Sebastopol, Calif.: O'Reilly & 
Associates, 1996). 



Many different file formats are in use for each of these 
representations. Discussion of file formats is outside the 
scope of this book. To convey photographic-quality 
color images, a file format must accommodate at least 
24 bits per pixel. To make maximum perceptual use of 
a limited number of bits per component, nonlinear 
coding should be used, as I outlined on page 12. 
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Symbolic image description 

Many methods are used to describe the content of 
a picture at a level of abstraction higher than directly 
enumerating the value of each pixel. Symbolic image 
data is converted to a raster image by the process of 
rasterizing. Images are rasterized (or imaged or rendered) 
by interpreting symbolic data and producing raster 
image data. In Figure 4.1, this operation passes 
information from the top row to the middle row. 

Geometric data describes the position, size, orienta- 
tion, and other attributes of objects; 3-D geometric 
data may be interpreted to produce an image from 
a particular viewpoint. Rasterizing from geometric data 
is called rendering; truecolor images are usually 
produced. 

Adobe's PostScript system is widely used to represent 
2-D illustrations, typographic elements, and publica- 
tions. PostScript is essentially a programming language 
specialized for imaging operations. When a PostScript 
file is executed by a PostScript interpreter, the image is 
rendered. (In PostScript, the rasterizing operation is 
often called raster image processing, or RIPping.) 

Once rasterized, raster image data generally cannot be 
transformed back into a symbolic description: A raster 
image - in the middle row of Figure 4.1 - generally 
cannot be returned to its description in the top row. If 
your application involves rendered images, you may 
find it useful to retain the symbolic data even after 
rendering, in case the need arises to rerender the 
image, at a different size, perhaps, or to perform 
a modification such as removing an object. 

Images from a fax machine, a video camera, or 
a grayscale or color scanner originate in raster image 
form: No symbolic description is available. Optical char- 
acter recognition (OCR) and raster-to- vector tech- 
niques make brave but generally unsatisfying attempts 
to extract text or geometric data from raster images. 
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Raster images 

There are four distinct types of raster image data: 

Bilevel, by definition 1 bit per pixel 

Grayscale, typically 8 bits per pixel 

Truecolor, typically 24 bits per pixel 

Pseudocolor, typically 8 bits per pixel 

Hicolor, with 16 bits per pixel, is a variant of truecolor. 

Grayscale and truecolor systems are capable of repre- 
senting continuous tone. Video systems use only true- 
color (and perhaps grayscale as a special case). 

In the following sections, I will explain bilevel, gray- 
scale, hicolor, truecolor, and pseudocolor in turn. Each 
description is accompanied by a block diagram that 
represents the hardware at the back end of the frame- 
buffer or graphics card (including the dlgital-to-analog 
converter, DAC). Alternatively, you can consider each 
block diagram to represent an algorithm that converts 
image data to monitor R', G', and B' components. 

Each pixel of a bilevel (or two-level) image comprises 
one bit, which represents either black or white - but 
nothing in between. In computing this is often called 
monochrome. (That term ought to denote shades of 
a single hue; however, in common usage - and partic- 
ularly in video - monochrome denotes the black-and- 
white, or grayscale, component of an image.) 

Since the invention of data communications, binary 
zero (0) has been known as space, and binary one (1) 
has been known as mark. A "mark" on a CRT emits 
light, so in video and in computer graphics a binary one 
(or the maximum code value) conventionally represents 
white. In printing, a "mark" deposits ink on the page, 
so in printing a binary one (or in grayscale, the 
maximum pixel value) conventionally represents black. 
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Grayscale 



Hicolor 



Figure 4.2 Hicolor 
(16-bit, 5-5-5) graphics 

provides 2 15 , or 32768 colors 
("thousands of colors"). Note 
the absence of LUTs: Image 
data is perceptually coded, 
relying upon the implicit 
2.5-power function of the 
monitor. D/A signifies digital- 
to-analog conversion. 



Figure 4.3 Hicolor (16-bit, 
5-6-5) graphics provides 
2 16 , or 65536 colors. Like 
the 5-5-5 scheme, image 
data is perceptually coded. 
An extra bit is assigned to 
the green channel. 
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A grayscale image represents an effectively continuous 
range of tones, from black, through intermediate shades 
of gray, to white. A grayscale system with a sufficient 
number of bits per pixel, 8 bits or more, can represent 
a black-and-white photograph. A grayscale system may 
or may not have a lookup table (LUT); it may or may 
not be perceptually uniform. 

In printing, a grayscale image is said to have continuous 
tone, or contone (distinguished from line art or type). 
When a contone image is printed, halftoning is ordi- 
narily used. 

Hicolor graphics systems store 16-bit pixels, partitioned 
into R', G', and B' components. Two schemes are in 
use. In the 5-5-5 scheme, sketched in Figure 4.2 below, 
each pixel comprises 5 bits for each of red, green, and 
blue. (One bit remains unused, or is used as a one-bit 
transparency mask - a crude "alpha" component. See 
page 334.) In the 5-6-5 scheme, sketched in Figure 4.3 
below, each pixel comprises 5 bits of red, 6 bits of 
green, and 5 bits of blue. 
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Truecolor 



Most truecolor systems have 
LUTs as by-products of their 
capability to handle pseudo- 
color, where like-sized CLUTs 
are necessary. 



R'G'B' codes in hicolor systems are directly applied to 
the DACs, and are linearly translated into monitor 
voltage with no intervening LUT. The response of the 
monitor produces luminance proportional to the 2.5- 
power of voltage. So, hicolor image coding is perceptu- 
ally uniform, comparable to video R'G'B' coding. 
However, 32 (or even 64) gradations of each compo- 
nent are insufficient for photographic-quality images. 

A truecolor system has separate red, green, and blue 
components for each pixel. In most truecolor systems, 
each component is represented by a byte of 8 bits: Each 
pixel has 24 bits of color information, so this mode is 
often called "24-bit color" (or "millions of colors"). The 
RGB values of each pixel can represent 2 24 , or about 
16.7 million, distinct codes. In computing, a truecolor 
framebuffer usually has three lookup tables (LUTs), one 
for each component. The LUTs and DACs of a 24-bit 
truecolor system are sketched in Figure 4.4 below. 

The mapping from image code value to monitor voltage 
is determined by the content of the LUTs. Owing to the 
perceptually uniform nature of the monitor, the best 
perceptual use is generally made of truecolor pixel 
values when each LUT contains an identity function 
("ramp") that maps input to output, unchanged. 



Figure 4.4 Truecolor (24-bit) 
graphics usually involves three 
programmable lookup tables 
(LUTs). The numerical values 
shown here are from the 
default Macintosh LUT. In 
video, R'G'B' values are trans- 
mitted to the DACs with no 
intervening lookup table. To 
make a truecolor computer 
system display video properly, 
the LUTs must be loaded with 
ramps that map input to 
output unchanged. 



24 bits 
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Poynton, Charles, “The rehabilita- 
tion of gamma" in Rogowitz, B.E., 
and T. N. Pappas, eds., Human Vision 
and Electronic Imaging III, Proc. 
SPIE/IS&T Conf. 3299 (Bellingham, 
Wash.: SPIE, 1998). 



Concerning alpha, see page 334. 



In computing, the LUTs can be set to implement an 
arbitrary mapping from code value to tristimulus value 
(and so, to intensity). The total number of pixel values 
that represent distinguishable colors depends upon the 
transfer function used. If the LUT implements a power 
function to impose gamma correction on linear-light 
data, then the code-100 problem will be at its worst. 
With 24-bit color and a properly chosen transfer func- 
tion, photographic quality images can be displayed and 
geometric objects can be rendered smoothly shaded 
with sufficiently high quality for many applications. But 
if the LUTs are set for linear-light representation with 
8 bits per component, contouring will be evident in 
many images, as I mentioned on page 12. Having 24-bit 
truecolor is not a guarantee of good image quality. If 
a scanner claims to have 30 bits (or 36 bits) per pixel, 
obviously each component has 10 bits (or 12 bits). 
However, it makes a great deal of difference whether 
these values are coded physically (as linear- light lumi- 
nance, loosely "intensity 11 ), or coded perceptually (as 
a quantity comparable to lightness). 

In video, either the LUTs are absent, or each is set to 
the identity function. Studio video systems are effec- 
tively permanently wired in truecolor mode with 
perceptually uniform coding: Code values are presented 
directly to the DACs, without intervening lookup tables. 

It is easiest to design a framebuffer memory system 
where each pixel has a number of bytes that is a power 
of two; so, a truecolor framebuffer often has four bytes 
per pixel - "32-bit color." Three bytes comprise the red, 
green, and blue color components; the fourth byte is 
used for purposes other than representing color. The 
fourth byte may contain overlay information. Alterna- 
tively, it may store an alpha component (a) repre- 
senting opacity from zero (fully transparent) to unity 
(fully opaque). In computer graphics, the alpha compo- 
nent conventionally multiplies components that are 
coded in the linear-light domain. In video, the corre- 
sponding component is called linear key, but the key 
signal is not typically proportional to tristimulus value 
(linear light) - instead, linear refers to code level, which 
is nonlinearly related to intensity. 
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Figure 4.5 Pseudocolor 
(8-bit) graphics systems 
use a limited number of 
integers, usually 0 
through 255, to repre- 
sent colors. Each pixel 
value is processed 
through a color lookup 
table (CLUT) to obtain 
red, green, and blue 
output values to be 
delivered to the monitor. 



8 bits (code 0...255) 




Pseudocolor In a pseudocolor (or indexed color, or colormapped) 

system, several bits - usually 8 - comprise each pixel in 
an image or framebuffer. This provides a moderate 
number of unique codes - usually 256 - for each pixel. 
Pseudocolor involves "painting by numbers," where the 
number of colors is rather small. In an 8-bit pseudo- 
color system, any particular image, or the content of 
the framebuffer at any instant in time, is limited to 
a selection of just 2 8 (or 256) colors from the universe 
of available colors. 



I reserve the term CLUT for 
pseudocolor. In grayscale and 
truecolor systems, the LUTs store 
transfer functions, not colors. In 
Macintosh, pseudocolor CLUT 
values are roughly, but not 
optimally, perceptually coded. 



Each code value is used as an index into a color lookup 
table (CLUT, colormap, or palette) that retrieves R'G'B' 
components; the DAC translates these linearly into 
voltage levels that are applied to the monitor. (Macin- 
tosh is an exception: Image data read from the CLUT is 
in effect passed through a second LUT.) Pseudocolor 
CLUT values are effectively perceptually coded. 



The CLUT and DACs of an 8-bit pseudocolor system are 
sketched in Figure 4.5 above. A typical lookup table 
retrieves 8-bit values for each of red, green, and blue, 
so each of the 256 different colors can be chosen from 
a universe of 2 24 , or 16777216, colors. (The CLUT may 
return 4, 6, or more than 8 bits for each component.) 



Pseudocolor image data is always accompanied by the 
associated colormap (or palette). The colormap may be 
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fixed, independent of the image, or it may be specific to 
the particular image ( adaptive or optimized). 



The browser-safe palette forms 
a radix-6 number system with 
RGB digits valued 0 through 5. 



216 = 6 3 



A popular choice for a fixed CLUT is the browser safe 
palette comprising the 216 colors formed by combina- 
tions of 8-bit R', C, and B' values chosen from the set 
{0, 51, 102, 153, 204, 255}. This set of 216 colors fits 
nicely within an 8-bit pseudocolor CLUT; the colors are 
perceptually distributed throughout the R'G'B’ cube. 

Pseudocolor is appropriate for images such as maps, 
schematic diagrams, or cartoons, where each color or 
combination is either completely present or completely 
absent at any point in the image. In a typical CLUT, 
adjacent pseudocolor codes are generally completely 
unrelated; for example, the color assigned to code 42 
has no necessary relationship to the color assigned to 
code 43. 

Conversion among types 

In Figure 4.1, traversal from left to right corresponds to 
conversions that can be accomplished without loss. 

Disregarding pseudocolor for the moment, data in any 
of the other four schemes of Figure 4.1 can be 
"widened" to any scheme to the right simply by 
assigning the codes appropriately. For example, 
a grayscale image can be widened to truecolor by 
assigning codes from black to white. Widening adds 
bits but not information. 

A pseudocolor image can be converted to hicolor or 
truecolor through software application of the CLUT. 
Conversion to hicolor is subject to the limited number 
of colors available in hicolor mode. Conversion to true- 
color can be accomplished without loss, provided that 
the truecolor LUTs are sensible. 

Concerning conversions in the reverse direction, an 
image can be "narrowed" without loss only if it contains 
only the colors or shades available in the mode to its 
left in Figure 4.1; otherwise, the conversion will involve 
loss of shades and/or loss of colors. 
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Ashdown, Ian, Color Quantiza- 
tion Bibliography, Internet, 
<ftp://ftp.ledalite.com/pub/ 
cquant97.bib> 



Figure 4.6 

Display modes 




A truecolor or hicolor image can be approximated in 
pseudocolor through software application of a fixed 
colormap. Alternatively, a colormap quantization algo- 
rithm can be used to examine a particular image (or 
sequence of images), and compute a colormap that is 
optimized or adapted for that image or sequence. 

Display modes 

A high data rate is necessary to refresh a PC or work- 
station display from graphics memory. Consequently, 
graphics memory has traditionally been implemented 
with specialized "video RAM" (VRAM) devices. A low- 
cost graphics adapter generally has a limited amount of 
this specialized memory, perhaps just one or two mega- 
bytes. Recently, it has become practical for graphics 
adapters to refresh from main memory (DRAM); this 
relaxes the graphics memory capacity constraint. 

Modern PC graphics subsystems are programmable 
among pseudocolor, hicolor, and truecolor modes. 
(Bilevel and grayscale have generally fallen into disuse.) 
The modes available in a typical system are restricted by 
the amount of graphics memory available. Figure 4.6 
sketches the three usual modes available in a system 
having one megabyte (1 MB) of VRAM. 

The top sketch illustrates truecolor (24 bits per pixel) 
operation. With just 1 MB of VRAM the pixel countwill 
be limited to V 3 megapixel, 640x480 ("VGA"). The 
advantage is that this mode gives access to millions of 
colors simultaneously. 

To increase pixel count to half a megapixel with just 
1 MB of VRAM, the number of bits per pixel must be 
reduced from 24. The middle sketch shows hicolor 
(16 bit per pixel) mode, which increases the pixel count 
to V 2 megapixel, 800x600. Plowever, the display is now 
limited to just 65536 colors at any instant. 

To obtain a one-megapixel display, say 1152x864, pixel 
depth is limited by 1 MB of VRAM to just 8 bits. This 
forces the use of pseudocolor mode, and limits the 
number of possible colors at any instant to just 256. 
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Image width is the product of so- 
called resolution and the count of 
image columns; height is computed 
similarly from the count of image 
rows. 



A point is a unit of distance equal 
to I /72 inch. The width of the stem 
of this bold letter I is one point, 
about 0.353 mm (that is, 353 pm). 
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In addition to constraining the relationship between 
pixel count and pixel depth, a display system may 
constrain the maximum pixel rate. A pixel rate 
constraint - 100 megapixels per second, for example - 
may limit the refresh rate at high pixel counts. 

A computer specialist might refer to display pixel count, 
such as 640x480, 800x600, or 1152x864, as "resolu- 
tion." An image scientist gives resolution a much more 
specific meaning; see Resolution, on page 65. 

Image files 

Images in bilevel, grayscale, pseudocolor, ortruecolor 
formats can be stored in files. A general-purpose image 
file format stores, in its header information, the count 
of columns and rows of pixels in the image. 

Many file formats - such as TIFF and EPS - store infor- 
mation about the intended size of the image. The 
intended image width and height can be directly 
stored, in absolute units such as inches or millimeters. 
Alternatively, the file can store sample density in units 
of pixels per inch (ppi), or less clearly, dots per inch (dpi). 
Sample density is often confusingly called "resolution" 

In some software packages, such as Adobe Illustrator, 
the intended image size coded in a file is respected. In 
other software, such as Adobe Photoshop, viewing at 
100% implies a 1:1 relationship between file pixels and 
display device pixels, disregarding the number of pixels 
per inch in the file and of the display. Image files 
without size information are often treated as having 
72 pixels per inch; application software unaware of 
image size information often uses a default of 72 ppi. 

"Resolution" in computer graphics 

In computer graphics, a pixel is often associated with an 
intensity distribution uniformly covering a small square 
area of the screen. In liquid crystal displays (LCDs), 
plasma display panels (PDPs), and digital micromirror 
displays (DMDs), discrete pixels such as these are 
constructed on the display device. When such a display 
is driven digitally at native pixel count, there is a one- 
to-one relationship between framebuffer pixels and 



RASTER IMAGES IN COMPUTING 



41 



device pixels. However, a graphic subsystem may 
resample by primitive means when faced with 
a mismatch between framebuffer pixel count and 
display device pixel count. If framebuffer count is 
higher, pixels are dropped; if lower, pixels are repli- 
cated. In both instances, image quality suffers. 

CRT displays typically have a Gaussian distribution of 
light from each pixel, as I will discuss in the next 
chapter. The typical spot size is such that there is some 
overlap in the distributions of light from adjacent pixels. 
You might think that overlap between the distributions 
of light produced by neighboring display elements, as in 
a CRT, is undesirable. However, image display requires 
a certain degree of overlap in order to minimize the 
visibility of pixel structure or scan-line structure. I will 
discuss this issue in Image structure, on page 43. 

Two disparate measures are referred to as resolution in 
computing: 

The count of image columns and image rows - that is, 
columns and rows of pixels - in a framebuffer 

The number of pixels per inch (ppi) intended for image 
data (often misleadingly denoted dots per inch, dpi) 

An image scientist considers resolution to be delivered 
to the viewer; resolution is properly estimated from 
information displayed at the display surface (or screen) 
itself. The two measures above all limit resolution, but 
neither of them quantifies resolution directly. In Resolu- 
tion, on page 65, I will describe how the term is used in 
image science and video. 
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Don't confuse PSF with 
progressive segmented-frame 
(PsF), described on page 62. 




Figure 5.1 "Box" reconstruc- 
tion of a bitmapped graphic 
image is shown. 



A naive approach to digital imaging treats the image as 
a matrix of independent pixels, disregarding the spatial 
distribution of light intensity across each pixel. You 
might think that optimum image quality is obtained 
when there is no overlap between the distributions of 
neighboring pixels; many computer engineers hold this 
view. However, continuous-tone images are best repro- 
duced if there is a certain degree of overlap; sharpness 
is reduced slightly, but pixel structure is made less 
visible and image quality is improved. 

The distribution of intensity across a displayed pixel is 
referred to as its point spread function (PSF). A one- 
dimensional slice through the center of a PSF is collo- 
quially called a spot profile. A display's PSF influences 
the nature of the images it reproduces. The effects of 
a PSF can be analyzed using filter theory, which I will 
discuss for one dimension in the chapter Filtering and 
sampling, on page 141, and for two dimensions in 
Image digitization and reconstruction, on page 187. 

A pixel whose intensity distribution uniformly covers 
a small square area of the screen has a point spread 
function referred to as a "box." PSFs used in contin- 
uous-tone imaging systems usually peak at the center of 
the pixel, fall off over a small distance, and overlap 
neighboring pixels to some extent. 

Image reconstruction 

Figure 5.1 reproduces a portion of a bitmapped (bilevel) 
graphic image, part of a computer's desktop display. 
Each sample is either black or white. The element with 
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horizontal "stripes" is part of a window's titlebar; the 
checkerboard background is intended to integrate to 
gray. Figure 5.1 shows reconstruction of the image with 
a "box" distribution. Each pixel is uniformly shaded 
across its extent; there is no overlap between pixels. 
This typifies an image as displayed on an LCD. 




Figure 5.2 Gaussian recon- 
struction is shown for the 
same bitmapped image as 
Figure 5.1. I will detail the 
one-dimensional Gaussian 
function on page 150. 



A CRT's electron gun produces an electron beam that 
illuminates a spot on the phosphor screen. The beam is 
deflected to form a raster pattern of scan lines that 
traces the entire screen, as I will describe in the 
following chapter. The beam is not perfectly focused 
when it is emitted from the CRT's electron gun, and is 
dispersed further in transit to the phosphor screen. 
Intensity produced for each pixel at the face of the 
screen has a "bell-shaped" distribution resembling 
a two-dimensional Gaussian function. With atypical 
amount of spot overlap, the checkerboard area of this 
example will display as a nearly uniform gray as 
depicted in Figure 5.2 in the margin. You might think 
that the blur caused by overlap between pixels would 
diminish image quality. Flowever, for continuous-tone 
("contone") images, some degree of overlap is not only 
desirable but necessary, as you will see from the 
following examples. 

Figure 5.3 at the top of the facing page shows a 16x20- 
pixel image of a dark line, slightly more than one pixel 
wide, at an angle 1 . 2 ° off-vertical. At the left, the image 
data is reconstructed using a box distribution. The 
jagged and "ropey" nature of the reproduction is 
evident. At the right, the image data is reconstructed 
using a Gaussian. It is blurry, but less jagged. 



I introduced visual acuity on page 8. 
For details, see Contrast sensitivity 
function (CSF), on page 201. 



Figure 5.4 in the middle of the facing page shows two 
ways to reconstruct the same 16x20 pixels (320 bytes) 
of continuous-tone grayscale image data. The left-hand 
image is reconstructed using a box function, and the 
right-hand image with a Gaussian. I constructed this 
example so that each image is 4 cm (1.6 inches) wide. 
At typical reading distance of 40 cm (16 inches), a pixel 
subtends 0.4°, where visual acuity is near its maximum. 
At this distance, when reconstructed with a box func- 
tion, the pixel structure of each image is highly visible; 
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Figure 5.3 Diagonal line recon- 
struction. At the left is a near- 
vertical line slightly more than 
1 pixel wide, rendered as an 
array 20 pixels high that has 
been reconstructed using a box 
distribution. At the right, the line 
is reconstructed using a Gaussian 
distribution. Between the images 
I have placed a set of markers to 
indicate the vertical centers of 
the image rows. 



Figure 5.4 Contone image 
reconstruction. At the left is 
a continuous-tone image of 
16x20 pixels that has been 
reconstructed using a box distri- 
bution. The pictured individual 
cannot be recognized. At the 
right is exactly the same image 
data, but reconstructed by a 
Gaussian function. The recon- 
structed image is very blurry but 
recognizable. Which reconstruc- 
tion function do you think is best 
for continuous-tone imaging? 






Figure 5.5 One frame of 
an animated sequence 



visibility of the pixel structure overwhelms the percep- 
tion of the image itself. The bottom right image is 
reconstructed using a Gaussian distribution. It is blurry, 
but easily recognizable as an American cultural icon. 
This example shows that sharpness is not always good, 
and blurriness is not always bad ! 

Figure 5.5 in the margin shows a 16x20-pixel image 
comprising 20 copies of the top row of Figure 5.3 (left). 
Consider a sequence of 20 animated frames, where 
each frame is formed from successive image rows of 
Figure 5.3. The animation would depict a narrow 
vertical line drifting rightward across the screen at a rate 
of 1 pixel every 8 frames. If image rows of Figure 5.3 
(left) were used, the width of the moving line would 
appear to jitter frame-to-frame, and the minimum light- 
ness would vary. With Gaussian reconstruction, as in 
Figure 5.3 (right), motion portrayal is much smoother. 
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Sampling aperture 

In a practical image sensor, each element acquires infor- 
mation from a finite region of the image plane; the 
value of each pixel is a function of the distribution of 
intensity over that region. The distribution of sensi- 
tivity across a pixel of an image capture device is 
referred to as its sampling aperture, sort of a PSF in 
reverse - you could call it a point "collection" function. 
The sampling aperture influences the nature of the 
image signal originated by a sensor. Sampling apertures 
used in continuous-tone imaging systems usually peak 
at the center of each pixel, fall off over a small distance, 
and overlap neighboring pixels to some extent. 




Figure 5.6 Moire pattern 

a form of aliasing in two dimen- 
sions, results when a sampling 
pattern (here the perforated 
square) has a sampling density 
that is too low for the image 
content (here the dozen bars, 
14° off-vertical). This figure is 
adapted from Fig. 3.12 of 
Wandell’s Foundations of Vision 
(cited on page 195). 



In 1915, Harry Nyquist published a landmark paper 
stating that a sampled analog signal cannot be recon- 
structed accurately unless all of its frequency compo- 
nents are contained strictly within half the sampling 
frequency. This condition subsequently became known 
as the Nyquist criterion; half the sampling rate became 
known as the Nyquist rate. Nyquist developed his 
theorem for one-dimensional signals, but it has been 
extended to two dimensions. In a digital system, it 
takes at least two elements - two pixels or two scan- 
ning lines - to represent a cycle. A cycle is equivalent to 
a line pair of film, or two "TV lines" (TVL). 

In Figure 5.6 in the margin, the black square punctured 
by a regular array of holes represents a grid of small 
sampling apertures. Behind the sampling grid is a set of 
a dozen black bars, tilted 14° off the vertical, repre- 
senting image information. In the region where the 
image is sampled, you can see three wide dark bars 
tilted at 45°. Those bars represent spatial aliases that 
arise because the number of bars per inch (or mm) in 
the image is greater than half the number of apertures 
per inch (or mm) in the sampling lattice. Aliasing can be 
prevented - or at least minimized - by imposing 
a spatial filter in front of the sampling process, as I will 
describe for one-dimensional signals in Filtering and 
sampling, on page 141, and for two dimensions in 
Image presampling filters, on page 192. 
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Figure 5.7 Bitmapped 
graphic image, rotated 



Nyquist explained that an arbitrary signal can be recon- 
structed accurately only if more than two samples are 
taken of the highest-frequency component of the signal. 
Applied to an image, there must be at least twice as 
many samples per unit distance as there are image 
elements. The checkerboard pattern in Figure 5.1 (on 
page 43) doesn't meet this criterion in either the 
vertical or horizontal dimensions. Furthermore, the 
titlebar element doesn't meet the criterion vertically. 
Such elements can be represented in a bilevel image 
only when they are in precise registration - "locked" - 
to the imaging system's sampling grid. Flowever, images 
captured from reality almost never have their elements 
precisely aligned with the grid ! 

Point sampling refers to capture with an infinitesimal 
sampling aperture. This is undesirable in continuous- 
tone imaging. Figure 5.7 in the margin shows what 
would happen if a physical scene like that in Figure 5.1 
were rotated 14°, captured with a point-sampled 
camera, and displayed with a box distribution. The 
alternating on-off elements are rendered with aliasing 
in both the checkerboard portion and the titlebar. 
(Aliasing would be evident even if this image were to 
be reconstructed with a Gaussian.) This example 
emphasizes that in digital imaging, we must represent 
arbitrary scenes, not just scenes whose elements have 
an intimate relationship with the sampling grid. 

A suitable presampling filter would prevent (or at least 
minimize) the Moire artifact of Figure 5.6, and prevent 
or minimize the aliasing of Figure 5.7. When image 
content such as the example titlebar and the desktop 
pattern of Figure 5.2 is presented to a presampling 
filter, blurring will occur. Considering only bitmapped 
images such as Figure 5.1 , you might think the blurring 
to be detrimental, but to avoid spatial aliasing in 
capturing high-quality continuous-tone imagery, some 
overlap is necessary in the distribution of sensitivity 
across neighboring sensor elements. 

Flaving introduced the aliasing artifact that results from 
poor capture PSFs, we can now return to the display 
and discuss reconstruction PSFs (spot profiles). 
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Spot profile 

The designer of a display system for continuous-tone 
images seeks to make a display that allows viewing at 
a wide picture angle, with minimal intrusion of artifacts 
such as aliasing or visible scan-line or pixel structure. 
Picture size, viewing distance, spot profile, and scan-line 
or pixel visibility all interact. The display system designer 
cannot exert direct control over viewing distance; spot 
profile is the parameter available for optimization. 

On page 45, I demonstrated the difference between 
a box profile and a Gaussian profile. Figures 5.3 and 5.4 
showed that some overlap between neighboring distri- 
butions is desirable, even though blur is evident when 
the reproduced image is viewed closely. 

When the images of Figure 5.3 or 5.4 are viewed from 
a distance of 10 m (33 feet), a pixel subtends a minute 
of arc (V 60 °). At this distance, owing to the limited 
acuity of human vision, both pairs of images are appar- 
ently identical. Imagine placing beside these images an 
emissive display having an infinitesimal spot, producing 
the same total flux for a perfectly white pixel. At 10 m, 
the pixel structure of the emissive display would be 
somewhat visible. At a great viewing distance - say at 
a pixel or scan-line subtense of less than V-] 80 °, corre- 
sponding to SDTV viewed at three times normal 
distance - the limited acuity of the human visual 
system causes all three displays to appear identical. As 
the viewer moves closer, different effects become 
apparent, depending upon spot profile. I'll discuss two 
cases: Box distribution and Gaussian distribution. 

Box distribution 

A typical digital projector - such as an LCD or a DMD - 
has a spot profile resembling a box distribution covering 
nearly the entire width and nearly the entire height 
corresponding to the pixel pitch. There is no significant 
gap between image rows or image columns. Each pixel 
has three color components, but the optics of the 
projection device are arranged to cause the distribution 
of light from these components to be overlaid. From 
a great distance, pixel structure will not be visible. 
However, as viewing distance decreases, aliasing ("the 
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jaggies") will intrude. Limited performance of projec- 
tion lenses mitigates aliasing somewhat; however, 
aliasing can be quite noticeable, as in the examples of 
Figures 5.3 and 5.4 on page 45. 




In a typical direct-view digital display, such as an LCD or 
a PDP, each pixel comprises three color components 
that occupy distinct regions of the area corresponding 
to each pixel. Ordinarily, these components are side-by- 
side. There is no significant gap between image rows. 
However, if one component (say green) is turned on 
and the others are off, there is a gap between columns. 
These systems rely upon the limited acuity of the viewer 
to integrate the components into a single colored area. 
At a close viewing distance, the gap can be visible, and 
this can induce aliasing. 

The viewing distance of a display using a box distribu- 
tion, such as a direct-view LCD or PDP, is limited by the 
intrusion of aliasing. 





Figure 5.8 Gaussian spot size. 

Solid lines graph Gaussian 
distributions of intensity 
across two adjacent image 
rows, for three values of spot 
size. The areas under each 
curve are identical. The 
shaded areas indicate their 
sums. In progressive scan- 
ning, adjacent image rows 
correspond to consecutive 
scan lines. In interlaced scan- 
ning, to be described in the 
following chapter, the situa- 
tion is more complex. 



Gaussian distribution 

As I have mentioned, a CRT display has a spot profile 
resembling a Gaussian. The CRT designer's choice of 
spot size involves a compromise illustrated by 
Figure 5.8. 

For a Gaussian distribution with a very small spot, say 
a spot width less than V 2 the scan-line pitch, line struc- 
ture will become evident even at a fairly large viewing 
distance. 

Fora Gaussian distribution with medium-sized spot, say 
a spot width approximately equal to the scan-line pitch, 
the onset of scan-line visibility will occur at a closer 
distance than with a small spot. 

As spot size is increased beyond about twice the scan- 
line pitch, eventually the spot becomes so large that no 
further improvement in line-structure visibility is 
achieved by making it larger. However, there is a serious 
disadvantage to making the spot larger than necessary: 
Sharpness is reduced. 
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Pixel 

72 ppi 
0.35 mm 




CRT spot 

0.63 mm 







CRT triad 

0.31 mm 



Figure 5.9 Pixel/spot/triad. 

Triad refers to the smallest 
complete set of red-producing, 
green-producing, and blue- 
producing elements of a 
display. CRT triads have no 
direct relationship to pixels; 
what is usually called dot pitch 
is properly called triad pitch. 



A direct-view color CRT display has several hundred 
thousand, or perhaps a million or more, triads of red, 
green, and blue phosphor dots deposited onto the back 
of the display panel. (A Sony Trinitron CRT has 
a thousand or more vertical stripes of red, green, and 
blue phosphor.) Triad pitch is the shortest distance 
between like-colored triads (or stripes), ordinarily 
expressed in millimeters. There is not a one-to-one rela- 
tionship between pixels and triads (or stripes). A typical 
CRT has a Gaussian spot whose width exceeds both the 
distance between pixels and the distance between 
triads. Ideally, there are many more triads (or stripes) 
across the image width than there are pixels - 1.2 times 
as many, or more. 

You saw at the beginning of this chapter that in order 
to avoid visible pixel structure in image display some 
overlap is necessary in the distributions of light 
produced by neighboring display elements. Such 
overlap reduces sharpness, but by how much? How 
much overlap is necessary? I will discuss these issues in 
the Chapter Resolution, on page 65. First, though, 

I introduce the fundamentals of raster scanning. 
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Raster scanning 



6 



I introduced the pixel array on page 6. In video, the 
samples of the pixel array are sequenced uniformly in 
time to form scan lines, which are in turn sequenced in 
time throughout each frame interval. This chapter 
outlines the basics of this process of raster scanning. In 
Chapter 11, Introduction to component SDTV, on 
page 95, I will present details on scanning in conven- 
tional "525-line" and "625-line" video. In Introduction 
to composite NTSC and PAL, on page 103, I will intro- 
duce the color coding used in these systems. In 
Chapter 13, Introduction to HDTV, on page 111,1 will 
introduce scanning in high-definition television. 

Flicker, refresh rate, and frame rate 

A sequence of still pictures, captured and displayed at 
a sufficiently high rate, can create the illusion of motion. 



Flicker is sometimes redundantly 
called large-area flicker. Take care to 
distinguish flicker, described here, 
from twitter, to be described on 
page 57. See Fukuda, Tadahiko, 
“Some Characteristics of Peripheral 
Vision," NHK Tech. Monograph No. 36 
(Tokyo: NFIK Science and Technical 
Research Laboratories, Jan. 1987). 



Many displays for moving images emit light for just 
a fraction of the frame time: The display is black for 
a certain duty cycle. If the flash rate - or refresh rate - is 
too low, flicker is perceived. The flicker sensitivity of 
vision is dependent upon the viewing environment: The 
brighter the environment, and the larger the angle 
subtended by the picture, the higher the flash rate must 
be to avoid flicker. Because picture angle influences 
flicker, flicker depends upon viewing distance. 



The brightness of the reproduced image itself influ- 
ences the flicker threshold to some extent, so the 
brighter the image, the higher the refresh rate must be. 
In a totally dark environment, such as the cinema, 
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The fovea has a diameter of about 
1.5 mm, and subtends a visual angle 
of about 5°. 




Figure 6.1 Dual-bladed 

shutter in a film projector 
flashes each frame twice. Rarely, 
3-bladed shutters are used; they 
flash each frame thrice. 

Television refresh rates were 
originally chosen to match the 
local AC power line frequency. 

See Frame, field, line, and sample 
rates, on page 371 . 



Farrell, Joyce E., et al., “Predicting 
Flicker Thresholds for Video 
Display Terminals," in Proc. 
Society for Information Display 
28 (4): 449-453 (1987). 



Viewing 

environment 


Ambient 

illumination 


Refresh (flash) 
rate, Hz 


Frame rate, 
Hz 


Cinema 


Dark 


48 


24 


Television 


Dim 


50 


25 




Dim 


= 60 


= 30 


Office 


Bright 


various, e.g., 
66, 72, 76, 85 


same as 
refresh rate 



Table 6.1 Refresh rate refers to the shortest interval over 
which the whole picture is displayed - the flash rate. 



flicker sensitivity is completely determined by the 
luminance of the image itself. Peripheral vision has 
higher temporal sensitivity than central (foveal) vision, 
so the flicker threshold increases to some extent with 
wider viewing angles. Table 6.1 summarizes refresh 
rates used in film, video, and computing: 

In the darkness of a cinema, a flash rate of 48 Hz is 
sufficient to overcome flicker. In the early days of 
motion pictures, a frame rate of 48 Hz was thought 
to involve excessive expenditure for film stock, and 
24 frames per second were found to be sufficient for 
good motion portrayal. So, a conventional film 
projector uses a dual-bladed shutter, depicted in 
Figure 6.1, to flash each frame twice. Higher realism can 
be obtained with single-bladed shutters at 60 frames 
per second or higher. 

In the dim viewing environment typical of television, 
such as a living room, a flash rate of 60 Hz suffices. The 
interlace technique, to be described on page 56, 
provides for video a function comparable to the dual- 
bladed shutter of a film projector: Each frame is flashed 
as two fields. Refresh is established by the field rate 
(twice the frame rate). For a given data rate, interlace 
doubles the apparent flash rate, and provides improved 
motion portrayal by doubling the temporal sampling 
rate. Scanning without interlace is called progressive. 

A computer display used in a bright environment such 
as an office may require a refresh rate above 70 Hz to 
overcome flicker. (See Farrell.) 
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Introduction to scanning 

In Flicker, refresh rate, and frame rate, on page 51, 

I outlined how refresh rate is chosen so as to avoid 
flicker. In Viewing distance and angle, on page 8, I will 
outline how spatial sampling determines the number of 
pixels in the pixel array. Video scanning represents 
pixels in sequential order, so as to acquire, convey, 
process, or display every pixel during the fixed time 
interval associated with each frame. 



The word raster is derived from the 
Greek word rustum (rake), owing to 
the resemblance of a raster to the 
pattern left on newly raked sand. 



Line is a heavily overloaded term. 
Lines may refer to the total number 
of raster lines: Figure 6.2 shows 
“525-line" video, which has 525 
total lines. Line may refer to a line 
containing picture, or to the total 
number of lines containing picture - 
in this example, 480. Line may 
denote the AC power line, whose 
frequency is very closely related to 
vertical scanning. Finally, lines is 
a measure of resolution, to be 
described in Resolution, on page 65. 



In analog video, information in the image plane is 
scanned left to right at a uniform rate during a fixed, 
short interval of time - the active line time. Scanning 
establishes a fixed relationship between a position in 
the image and a time instant in the signal. Successive 
lines are scanned at a uniform rate from the top to the 
bottom of the image, so there is also a fixed relation- 
ship between vertical position and time. 

The stationary pattern of parallel scanning lines 
disposed across the image is the raster. Digital video 
conveys samples of the image matrix in the same order 
that information would be conveyed in analog video: 
first the top line (left to right), then successive lines. 

In cameras and displays, a certain time interval is 
consumed in advancing the scanning operation - in 
retracing - from one line to the next. Several line times 
are consumed by vertical retrace, from the bottom of 
one scan to the top of the next. A CRT's electron gun 
must be switched off (blanked) during these intervals, 
so they are called blanking intervals. The horizontal 
blanking interval occurs between scan lines; the vertical 
blanking interval (VBI) occurs between frames (or 
fields). Figure 6.2 overleaf shows the blanking intervals 
of "525-line'' video. The horizontal and vertical 
blanking intervals required for a CRT are large fractions 
of the line time and frame time: Vertical blanking 
consumes roughly 8% of each frame period. 

In an analog video interface, synchronization informa- 
tion (sync) is conveyed during the blanking intervals. In 
principle, a digital video interface could omit blanking 
intervals and use an interface clock corresponding just 
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Figure 6.2 Blanking intervals 

for "525-line" video are indi- 
cated here by a dark region 
surrounding a light-shaded 
rectangle that represents the 
picture. The vertical blanking 525 
interval (VBI) consumes about 
8% of each field time; hori- 
zontal blanking consumes 
about 15% of each line time. 




VERTICAL 
BLANKING 
INTERVAL («8%) 



HORIZONTAL 
BLANKING 
INTERVAL (»15%) 



The count of 480 picture lines in to the active pixels. However, this would be imprac- 

Figure 6.2 is a recent stan dard; tical, b ecause it would lead to two clock domains in 

some people would say 483 or 487. 

See Picture lines, on page 324. equipment that required blanking intervals, and this 

would cause unnecessary complexity in logic design. 
Instead, digital video interfaces use clock frequencies 
chosen to match the large blanking intervals of typical 
display equipment. What would otherwise be excess 
data capacity is put to good use conveying audio 
signals, captions, test signals, error detection or correc- 
tion information, or other data or metadata. 

Scanning parameters 

In progressive scanning, all of the lines of the image are 
scanned in order, from top to bottom, at a picture rate 
sufficient to portray motion. Figure 6.3 at the top of the 
facing page indicates four basic scanning parameters: 

• Total lines ( Lj ) comprises all of the scan lines, that is, 
both the vertical blanking interval and the picture lines. 

• Active lines (L A ) contain the picture. 

• Samples per total line (S TL ) comprises the samples in the 
total line, including horizontal blanking. 

• Samples per active line (S AL ) counts samples that are 
permitted to take values different from blanking level. 

The production aperture, sketched in Figure 6.3, 
comprises the array 5 AL columns by L A rows. The 
samples in the production aperture comprise the pixel 
array; they are active. All other sample intervals 
comprise blanking; they are inactive (or blanked), 
though they may convey vertical interval information 
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Figure 6.3 Production aperture 

comprises the array S AL columns 
by L A rows. Blanking intervals 
lie outside the production aper- 
ture; here, blanking intervals 
are darkly shaded. The product L 
of S AL and L A yields the active 
pixel count per frame. Sampling 
rate (/ s ) is the product of 
Lj, and frame rate. 



Figure 6.4 Clean aperture 

should remain subjectively free 
from artifacts arising from 
filtering. The clean aperture 
excludes blanking transition 
samples, indicated here by black 
bands outside the left and right 
edges of the picture width, 
defined by the count of samples 
per picture width (S PW ). 



The horizontal center of the picture 
lies midway between the central 
two luma samples, and the vertical 
center of the picture lies vertically 
midway between two image rows. 



See Transition samples, on page 323. 




PRODUCTION 
APERTURE 
( 5 al * *-a) 




CLEAN 

APERTURE 



such as VITS, VITC, or closed captions. Consumer 
display equipment must blank these lines, or place 
them offscreen. 

At the left-hand edge of picture information on a scan 
line, if the video signal immediately assumes a value 
greatly different from blanking, an artifact called ringing 
is liable to result when that transition is processed 
through an analog or digital filter. A similar circum- 
stance arises at the right-hand picture edge. In studio 
video, the signal builds to full amplitude, or decays to 
blanking level, over several transition samples ideally 
forming a raised cosine shape. 

Active samples encompass not only the picture, but 
also the transition samples; see Figure 6.4 above. 

Studio equipment should maintain the widest picture 
possible within the production aperture, subject to 
appropriate blanking transitions. 
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Figure 6.5 Interlaced scan- 
ning forms a complete 
picture - the frame - from 
two fields, each comprising 
half of the total number of 
scanning lines. The second 
field is delayed by half the 
frame time from the first. This 
example shows 525 lines. 



I detailed spot profile in Image 
structure, on page 43. 



It is confusing to refer to fields as 
odd and even. Use first field and 
second field instead. 



RCA trademarked the word 
ProScan, but RCA - now Thomson 
confusingly uses that word to 
describe both progressive and 
interlaced television receivers! 




Interlaced scanning 

I have treated the image array as a matrix of S AL by Z. A 
pixels, without regard for the spatial distribution of light 
intensity across each pixel - the spot profile. If spot 
profile is such that there is a significant gap between 
the intensity distributions of adjacent image rows (scan 
lines), then scan-line structure will be visible to viewers 
closer than a certain distance. The gap between scan 
lines is a function of scan-line pitch and spot profile. 
Spot size can be characterized by spot diameter at 50% 
intensity. For a given scan-line pitch, a smaller spot size 
will force viewers to be more distant from the display 
if scan lines are to be rendered invisible. 

Interlacing is a scheme by which we can reduce spot 
size without being thwarted by scan-line visibility. The 
full height of the image is scanned with a narrow spot, 
leaving gaps in the vertical direction. Then, V 50 or Vgo s 
later, the full image height is scanned again, but offset 
vertically so as to fill in the gaps. A frame now 
comprises two fields, denoted first and second. The 
scanning mechanism is depicted in Figure 6.5 above. 

For a given level of scan-line visibility, this technique 
enables closer viewing distance than would be possible 
for progressive display, historically, the same raster 
standard was used across an entire television system, so 
interlace was used not only for display but also for 
acquisition, recording, and transmission. 

Noninterlaced (progressive or sequential) scanning is 
universal in desktop computers and in computing. 
Progressive scanning has been introduced for digital 
television and PIDTV. Plowever, the interlace technique 
remains ubiquitous in conventional broadcast televi- 
sion, and is dominant in PIDTV. 
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Figure 6.6 Twitter would 
result if this scene were 
scanned at the indicated line 
pitch by a camera without 
vertical filtering, then 
displayed using interlace. 



The flicker susceptibility of vision stems from a wide- 
area effect: As long as the complete height of the 
picture is scanned sufficiently rapidly to overcome 
flicker, small-scale picture detail, such as that in the 
alternate lines, can be transmitted at a lower rate. With 
progressive scanning, scan-line visibility limits the 
reduction of spot size. With interlaced scanning, this 
constraint is relaxed by a factor of two. However, inter- 
lace introduces a new constraint, that of twitter. 

If an image has vertical detail at a scale comparable to 
the scanning line pitch - for example, if the fine pattern 
of horizontal line pairs in Figure 6.6 is scanned - then 
interlaced display causes the content of the first and the 
second fields to differ markedly. At practical field rates - 
50 or 60 Hz - this causes twitter, a small-scale phenom- 
enon that is perceived as a scintillation, or an extremely 
rapid up-and-down motion. If such image information 
occupies a large area, then flicker is perceived instead 
of twitter. Twitter is sometimes called interline flicker, 
but that is a bad term because flicker is by definition 
a wide-area effect. 



Twitter is produced not only from degenerate images 
such as the fine black-and-white lines of Figure 6.6, but 
also from high-contrast vertical detail in ordinary 
images. High-quality video cameras include optical 
spatial lowpass filters to attenuate vertical detail that 
would otherwise be liable to produce twitter. When 
computer-generated imagery (CGI) is interlaced, vertical 
detail must be filtered in order to avoid flicker. A circuit 
to accomplish this is called a twitter filter. 

Interlace in analog systems 

Interlace is achieved in analog devices by scanning 
vertically at a constant rate between 50 and 60 Hz, and 
scanning horizontally at an odd multiple of half that 
rate. In SDTV in North America and Japan, the field rate 
is 59.94 Hz; line rate (/ H ) is 525 /2 (262 V2) times that 
rate. In Asia, Australia, and Europe, the field rate is 
50 Hz; the line rate is 625 /2 (312 V2) times the field rate. 
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Figure 6.7 Horizontal and 
vertical drive pulses effect 
interlace in analog scanning. 
0 V denotes the start of each 
field. The halfline offset of 
the second Oy causes inter- 
lace. Here, 5 76/ scanning is 
shown. 



Details will be presented in 
Analog SDTV sync, genlock, and 
interface on page 399. 
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Figure 6.7 above shows the horizontal drive (HD) and 
vertical drive (VD) pulse signals that were once distrib- 
uted in the studio to cause interlaced scanning in 
analog equipment. These signals have been superseded 
by a combined sync (or composite sync) signal; vertical 
scanning is triggered by broad pulses having total dura- 
tion of 2 V 2 or 3 lines. Sync is usually imposed onto the 
video signal, to avoid separate distribution circuits. 
Analog sync is coded at a level "blacker than black" 

Interlace and progressive 

For a given viewing distance, sharpness is improved as 
spot size becomes smaller. However, if spot size is 
reduced beyond a certain point, depending upon the 
spot profile of the display, either scan lines or pixels will 
become visible, or aliasing will intrude. In principle, 
improvements in bandwidth or spot profile reduce 
potential viewing distance, enabling a wider picture 
angle. However, because consumers form expectations 
about viewing distance, we assume a constant viewing 
distance and say that resolution is improved instead. 

A rough conceptual comparison of progressive and 
interlaced scanning is presented in Figure 6.8 opposite. 
At first glance, an interlaced system offers twice the 
number of pixels - loosely, twice the spatial resolu- 
tion - as a progressive system with the same data 
capacity and the same frame rate. Owing to twitter, 
spatial resolution in a practical interlaced system is not 
double that of a progressive system at the same data 
rate. Historically, cameras have been designed to avoid 
producing so much vertical detail that twitter would be 
objectionable. However, resolution is increased by 
a factor large enough that interlace has historically been 
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Figure 6.8 Progressive and interlaced scanning are compared. The top left sketch depicts an 
image of 4x3 pixels transmitted during an interval of Veo s. The top center sketch shows image 
data from the same 12 locations transmitted in the following V60 s interval. The top right sketch 
shows the spatial arrangement of the 4x3 image, totalling 12 pixels; the data rate is 12 pixels per 
1/60 s. At the bottom left, 12 pixels comprising image rows 0 and 2 of a 6x4 image array are trans- 
mitted in 1/60 s. At the bottom center, the 12 pixels of image rows 1 and 3 are transmitted in the 
following 1/60 s interval. At the bottom right, the spatial arrangement of the 6x4 image is shown: 
The 24 pixel image is transmitted in V30 s. Interlaced scanning has the same data rate as progres- 
sive, but at first glance has twice the number of pixels, and potentially twice the resolution. 



Notation 


Pixel array 


VGA 


640x480 


SVGA 


800x600 


XGA 


1024x768 


SXGA 


1280x1024 


UXGA 


1600x1200 


QXGA 


2048x1365 



Table 6.2 Scanning in 
computing has no standard- 
ized notation, but these 
notations are widely used. 



considered worthwhile. The improvement comes at the 
expense of introducing some aliasing and some vertical 
motion artifacts. Also, interlace makes it difficult to 
process motion sequences, as I will explain on page 61. 

Scanning notation 

In computing, display format may be denoted by a pair 
of numbers: the count of pixels across the width of the 
image, and the number of picture lines. Alternatively, 
display format may be denoted symbolically - VGA, 
SVGA, XGA, etc., as in Table 6.2. Square sampling is 
implicit. This notation does not indicate refresh rate. 

Traditionally, video scanning was denoted by the total 
number of lines per frame (picture lines plus sync and 
vertical blanking overhead), a slash, and th e field rate in 
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Computing Video 

notation notation 

640x480 525/59.94 



480/29.97 

Figure 6.9 My scanning 

notation gives the count 
of active (picture) lines, p for 
progressive or / for interlace, 
then the frame rate. Because 
some people write 480p60 
when they mean 480/?59.94, 
the notation 60.00 should 
be used to emphasize a rate 
of exactly 60 Hz. 



Since all 480/ systems have a frame 
rate of 29.97 Hz, I use 480/ as 
shorthand for 480/29.97. Similarly, 

I use 576/ as shorthand for 576/25. 




Figure 6.10 Test scene 



hertz. (Interlace is implicit unless a slash and 7/7 is 
appended to indicate progressive scanning; a slash and 
2;7 makes interlace explicit.) 525/59.94/2:1 scanning is 
used in North America and Japan; 625/50/2:1 prevails 
in Europe, Asia, and Australia. Until very recently, these 
were the only scanning systems used for broadcasting. 

Recently, digital technology has enabled several new 
scanning standards. Conventional scanning notation 
cannot adequately describe the new scanning systems, 
and a new notation is emerging, depicted in Figure 6.9: 
Scanning is denoted by the count of active (picture) 
lines, followed by p for progressive or / for interlace, 
followed by the frame rate. I write the letter / in lower- 
case, and in italics, to avoid potential confusion with 
the digit 1. For consistency, I also write the letter p in 
lowercase italics. Traditional video notation (such as 
625/50) is inconsistent, juxtaposing lines per frame with 
fields per second. Some people seem intent upon 
carrying this confusion into the future, by denoting the 
old 525/59.94 as 480/59.94. In my notation, I use 
frame rate. 

In my notation, conventional 525/59.94/2:1 video is 
denoted 480/29.97; conventional 625/50/2:1 video is 
denoted 576/25. HDTV systems include 720p60 and 
1080/30. Film-friendly versions of HDTV are denoted 
720p24 and 1080p24. Aspect ratio is not explicit in the 
new notation: 720 p, 1080/, and 1080p are implicitly 
16:9 since there are no 4:3 standards for these systems, 
but 480/30.00 or 480p60.00 could potentially have 
either conventional 4:3 or widescreen 16:9 aspect ratio. 

Interlace artifacts 

An interlaced camera captures 60 (or 50) unique fields 
per second. If a scene contains an object in motion with 
respect to the camera, each field carries half the 
object's spatial information, but information in the 
second field will be displaced according to the object's 
motion. 

Consider the test scene sketched in Figure 6.10, 
comprising a black background partially occluded by 
a white disk that is in motion with respect to the 
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FIRST 

FIELD 




Figure 6.11 Interlaced capture 

samples the position of a 
football at about 60 times 
per second, even though 
frames occur at half that rate. 

(A soccer ball takes 
50 positions per second.) 




Figure 6.12 Static lattice 

approach to stitching two 
fields into a frame produces 
the "mouse's teeth" or "field 
tearing" artifact on moving 
objects. 



camera. The first and second fields imaged from this 
scene are illustrated in Figure 6.11. (The example 
neglects capture blur owing to motion during the expo- 
sure; it resembles capture by a CCD camera set for 
a very short exposure time.) The image in the second 
field is delayed with respect to the first by half the 
frame time (that is, by Veo s or V50 s); by the time the 
second field is imaged, the object has moved. 

Upon interlaced display, the time sequence of inter- 
laced fields is maintained: No temporal or spatial arti- 
facts are introduced. However, reconstruction of 
progressive frames is necessary for high-quality resizing, 
repositioning, upconversion, downconversion, or stan- 
dards conversion. You can think of an interlaced signal 
as having its lines rearranged (permuted) compared to 
a progressive signal; however, in the presence of 
motion, simply stitching two fields into a single frame 
produces spatial artifacts such as that sketched in 
Figure 6.12. Techniques to avoid such artifacts will be 
discussed in Deinterlacing, on page 437. 

Examine the interlaced (bottom) portion of Figure 6.8, 
on page 59, and imagine an image element moving 
slowly down the picture at a rate of one row of the 
pixel array every field time - in a 480/29.97 system, 
V 48 O of the picture height in Veo s, or one picture 
height in 8 seconds. Owing to interlace, half of that 
image's vertical information will be lost! At other rates, 
some portion of the vertical detail in the image will be 
lost. With interlaced scanning, vertical motion can 
cause serious motion artifacts. 



Motion portrayal 

In Flicker, refresh rate, and frame rate, on page 51, 

I outlined the perceptual considerations in choosing 
refresh rate. In order to avoid objectionable flicker, it is 
necessary to flash an image at a rate higher than the 
rate necessary to portray its motion. Different applica- 
tions have adopted different refresh rates, depending 
on the image quality requirements and viewing condi- 
tions. Refresh rate is generally engineered into a video 
system; once chosen, it cannot easily be changed. 
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Poynton, Charles, "Motion 
portrayal, eye tracking, and 
emerging display technology," in 
Proc. 30th SMPTE Advanced 
Motion Imaging Conference (New 
York: SMPTE, 1996), 192-202. 



Historically, this was called 
3-2 pulldown, but with the 
adoption of SMPTE RP 197, it is 
now more accurately called 
2-3 pulldown. See page 430. 



The progressive segmented-frame 
(PsF) technique is known in 
consumer SDTV systems as quasi- 
interlace. PsF is not to be confused 
with point spread function , PSF. 



Flicker is minimized by any display device that produces 
steady, unflashing light for the duration of the frame 
time. You might regard a nonflashing display to be more 
suitable than a device that flashes; many modern 
devices do not flash. However, if the viewer's gaze 
tracks an element that moves across the image, 
a display with a pixel duty cycle near 100% - that is, an 
on-time approaching the frame time - will exhibit 
smearing of that element. This problem becomes more 
severe as eye tracking velocities increase, such as with 
the wide viewing angle of HDTV. 

Film at 24 frames per second is transferred to inter- 
laced video at 60 fields per second by 2-3 pulldown. 
The first film frame is transferred to two video fields, 
then the second film frame is transferred to three video 
fields; the cycle repeats. The 2-3 pulldown is normally 
used to produce video at 59.94 Hz, not 60 Hz; the film 
is run 0.1% slower than 24 frames per second. I will 
detail the scheme in 2-3 pulldown, on page 429. The 
2-3 technique can be applied to transfer to progressive 
video at 59.94 or 60 frames per second. Film is trans- 
ferred to 576/ video using 2-2 pulldown: Each film 
frame is scanned into two video fields (or frames); the 
film is run 4% fast. 

Segmented frame (24PsF) 

A scheme called progressive segmented-frame has been 
adopted to adapt HDTV equipment to handle images at 
24 frames per second. The scheme, denoted 24PsF, 
samples in progressive fashion: Both fields represent the 
same instant in time, and vertical filtering to reduce 
twitter is both unnecessary and undesirable. However, 
lines are rearranged to interlaced order for studio distri- 
bution and recording. Proponents of the scheme claim 
compatibility with interlaced processing and recording 
equipment, a dubious objective in my view. 

Video system taxonomy 

Insufficient channel capacity was available at the outset 
of television broadcasting to transmit three separate 
color components. The NTSC and PAL techniques were 
devised to combine ( encode ) the three color compo- 
nents into a single composite signal. Composite video 
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Table 6.3 Video systems are 
classified as analog or digital, 
and component or composite 
(or S-video). SDTV may be 
represented in component, 
hybrid (S-video), or composite 
forms. HDTV is always in 
component form. (Certain 
degenerate forms of analog 
NTSC and PAL are itemized in 
Table 49.1, on page 581 .) 



The 4 / sc notation will be 
introduced on page 108. 



By NTSC, I do not mean 525/59.94 
or 480/; by PAL, I do not mean 
625/50 or 576/! See Introduction 
to composite NTSC and PAL, on 
page 103. Although SECAM is 
a composite technique in that luma 
and chroma are combined, it has 
little in common with NTSC and 
PAL. SECAM is obsolete for video 
production; see page 576. 







Analog 


Digital 


HDTV 


Component 


R'G'B', 

709 Y'P B P R 


4:2:2 

709 Y'C b C r 




Component 


R'G'B', 

601 v"p b p r 


4:2:2 

6°i rc B c R 


SDTV 


Hybrid 


S-video 






Composite 


NTSC, PAL 


4/sc 



remains in use for analog broadcast and in consumers' 
premises; much composite digital (4/ sc ) equipment is 
still in use by broadcasters in North America. However, 
virtually all new video equipment - including all 
consumer digital video equipment, and all HDTV equip- 
ment - uses component video, either V'P B P R analog 
components or V"C B C R digital components. 

A video system can be classified as component HDTV, 
component SDTV, or composite SDTV. Independently, 
a system can be classified as analog or digital. Table 6.3 
above indicates the six classifications, with the associ- 
ated color encoding schemes. Composite NTSC and PAL 
video encoding is used only in 480/ and 5 76/ systems; 
HDTV systems use only component video. S-video is 
a hybrid of component analog video and composite 
analog NTSC or PAL; in Table 6.3, S-video is classified in 
its own seventh (hybrid) category. 

Conversion among systems 

In video, encoding traditionally referred to converting 
a set of R'C'B' components into an NTSC or PAL 
composite signal. Encoding may start with R'G'B', 
V"C B C R , or V'P B P R components, or may involve matrixing 
from R'G'B’ to form luma (V") and intermediate [U, V] or 
[/, Q] components. Quadrature modulation then forms 
modulated chroma (C); luma and chroma are then 
summed. Decoding historically referred to converting an 
NTSC or PAL composite signal to R'G'B’. Decoding 
involves luma/chroma separation, quadrature demodu- 
lation to recover [U, V] or [/, Q], then scaling to recover 
[C B , C R ] or [P B , P R ], or matrixing of luma and chroma to 
recover R'G'B'. Encoding and decoding are now general 
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Transcoding refers to the technical 
aspects of conversion; signal modi- 
fications for creative purposes are 
not encompassed by the term. 



In radio frequency (RF) tech- 
nology, upconversion refers to 
conversion of a signal to a higher 
carrier frequency; downconversion 
refers to conversion of a signal to 
a lower carrier frequency. 



Watkinson, John, The Engineer's 
Guide to Standards Conversion 
(Petersfield, Hampshire, 
England: Snell & Wilcox, 1994). 



terms; they may refer to JPEG, M-JPEG, MPEG, or other 
encoding or decoding processes. 

Transcoding traditionally referred to conversion among 
different color encoding methods having the same scan- 
ning standard. Transcoding of component video 
involves chroma interpolation, matrixing, and chroma 
subsampling. Transcoding of composite video involves 
decoding, then reencoding to the other standard. With 
the emergence of compressed storage and digital distri- 
bution, the term transcoding is now applied toward 
various methods of recoding compressed bitstreams, or 
decompressing then recompressing. 

Scan conversion refers to conversion among scanning 
standards having different spatial structures, without 
the use of temporal processing. If the input and output 
frame rates differ, motion portrayal is liable to be 
impaired. (In desktop video, and low-end video, this 
operation is sometimes called scaling.) 

Historically, upconversion referred to conversion from 
SDTV to HDTV; downconversion referred to conversion 
from HDTV to SDTV. Historically, these terms referred 
to conversion of a signal at the same frame rate as the 
input; nowadays, frame rate conversion might be 
involved. High-quality upconversion and downconver- 
sion require spatial interpolation. That, in turn, is best 
performed in a progressive format: If the source is inter- 
laced, intermediate deinterlacing is required, even if the 
target format is interlaced. 

Standards conversion denotes conversion among scan- 
ning standards having different frame rates. Historically, 
the term implied similar pixel count (such as conver- 
sion between 480/ and 5 76/), but nowadays a stan- 
dards converter might incorporate upconversion or 
downconversion. Standards conversion requires a field- 
store or framestore; to achieve high quality, it requires 
several fieldstores and motion-compensated interpola- 
tion. The complexity of standards conversion between 
480/ and 5 76/ is the reason that it has been difficult for 
broadcasters and consumers to convert European mate- 
rial for use in North America or Japan, or vice versa. 
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An electrical engineer may call 
this simply frequency response. 
The qualifier magnitude distin- 
guishes it from other functions 
of frequency such as phase 
frequency response. 



To avoid visible pixel structure in image display, some 
overlap is necessary in the distributions of light 
produced by neighboring display elements, as 
I explained in Image structure, on page 43. Also, to 
avoid spatial aliasing in image capture, some overlap is 
necessary in the distribution of sensitivity across neigh- 
boring sensor elements. Such overlap reduces sharp- 
ness. In this chapter, I will explain resolution, which is 
closely related to sharpness. Before introducing resolu- 
tion, I must introduce the concepts of magnitude 
frequency response and bandwidth. 

Magnitude frequency response and bandwidth 

Rather than analyzing a spot of certain dimensions, we 
analyze a group of closely spaced identical elements, 
characterizing the spacing between the elements. This 
allows mathematical analysis using transforms, particu- 
larly the Fourier transform and the z-transform. 

The top graph in Figure 7.1 overleaf shows a one- 
dimensional sine wave test signal "sweeping" from zero 
frequency up to a high frequency. (This could be a one- 
dimensional function of time such as an audio wave- 
form, or the waveform of luma from one scan line of an 
image.) A typical optical or electronic imaging system 
involves temporal or spatial dispersion, which causes 
the response of the system to diminish at high 
frequency, as shown in the middle graph. The envelope 
of that waveform - the system's magnitude frequency 
response - is shown at the bottom. 
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Figure 7.1 Magnitude frequency response of an electronic or optical system typically falls as 
frequency increases. Bandwidth is measured at the half-power point (-3 d B) , where response has 
fallen to 0.707 of its value at a reference frequency (often zero frequency, or DC). Useful visible 
detail is obtained from signal power beyond the half-power bandwidth, that is, at depths of 
modulation less than 70.7%. I show limiting resolution, which might occur at about 10% response. 



There are other definitions of band- 
width, but this is the definition that 
I recommend. In magnitude squared 
response, the half-power point is at 
0.5 on a linear scale. 



Bandwidth characterizes the range of frequencies that 
a system can capture, record, process, or transmit. Half- 
power bandwidth (also known as 3 dB bandwidth) is 
specified or measured where signal magnitude has 
fallen 3 dB - that is, to the fraction 0.707 - from its 
value at a reference frequency (often zero frequency, or 
DC). Useful visual information is typically available at 
frequencies higher than the bandwidth. In image 
science, limiting resolution is determined visually. 



The maximum rate at which an analog or digital elec- 
tronic signal can change state - in an imaging system, 
between black and white - is limited by frequency 
response, and is therefore characterized by bandwidth. 
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When digital information is 
processed or transmitted through 
analog channels, bits are coded into 
symbols that ideally remain indepen- 
dent. Dispersion in this context is 
called intersymbol interference ( ISI ). 



k for Kell factor is unrelated to 
K rating, sometimes called l< factor, 
which I will describe on page 542. 



Kell, R.D., A.V. Bedford, and G.L. 
Fredendall, “A Determination of the 
Optimum Number of Lines in 
a Television System," in RCA Review 
5: 8-30 (July 1940). 



Hsu, Stephen C.,“The Kell Factor: 
Past and Present," in SMPTE Journal 
95 (2): 206-214 (Feb. 1986). 



Figure 7.1 shows abstract input and output signals. 
When bandwidth of an optical system is discussed, it is 
implicit that the quantities are proportional to inten- 
sity. When bandwidth of video signals is discussed, it is 
implicit that the input and output electrical signals are 
gamma-corrected. 

Many digital technologists use the term bandwidth to 
refer to data rate; however, the terms properly refer to 
different concepts. Bandwidth refers to the frequency of 
signal content in an analog or digital signal. Data rate 
refers to digital transmission capacity, independent of 
any potential signal content. A typical studio SDTV 
signal has 5.5 MHz signal bandwidth and 13.5 MB/s 
data rate - the terms are obviously not interchangeable. 

Kell effect 

Television systems in the 1930s failed to deliver the 
maximum resolution that was to be expected from 
Nyquist's work (which I introduced on page 46). In 
1934, Kell published a paper quantifying the fraction of 
the maximum theoretical resolution achieved by RCA's 
experimental television system. He called this fraction k; 
later, it became known as the Kell factor (less desirably 
denoted K). Kell's first paper gives a factor of 0.64, but 
fails to give a complete description of his experimental 
method. A subsequent paper (in 1940) described the 
method, and gives a factor of 0.8, under somewhat 
different conditions. 

Kell's k factor was determined by subjective, not objec- 
tive, criteria. If the system under test had a wide, gentle 
spot profile resembling a Gaussian, closely spaced lines 
on a test chart would cease to be resolved as their 
spacing diminished beyond a certain value. If a camera 
under test had an unusually small spot size, or a display 
had a sharp distribution (such as a box), then Kell's 
k factor was determined by the intrusion of objection- 
able artifacts as the spacing reduced - also a subjective 
criterion. 

Kell and other authors published various theoretical 
derivations that justify various numerical factors; 
Stephen Hsu provides a comprehensive review. In my 
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opinion, such numerical measures are so poorly defined 
and so unreliable that they are now useless. Hsu says: 



I introduced twitter on page 57. 



Mitsuhashi, Tetsuo, "Scanning Spec- 
ifications and Picture Quality," in 
Fujio, X, et al., High Definition tele- 
vision, NHK Science and Technical 
Research Laboratories Technical 
Monograph 32 (June 1982). 



Kell factor is defined so ambiguously that individual 
researchers have justifiably used different theoretical 
and experimental techniques to derive widely varying 
values of k. 

Today I consider it poor science to quantify a Kell 
factor. However, Kell made an important contribution 
to television science, and I think it entirely fitting that 
we honor him with the Kell effect: 

In a video system - including sensor, signal processing, 
and display - Kell effect refers to the loss of resolution, 
compared to the Nyquist limit, caused by the spatial 
dispersion of light power. Some dispersion is necessary to 
avoid aliasing upon capture, and to avoid objectionable 
scan line (or pixel) structure at display. 

Kell's 1934 paper concerned only progressive scanning. 
With the emergence of interlaced systems, it became 
clear that twitter resulted from excessive vertical detail. 
To reduce twitter to tolerable levels, it was necessary to 
reduce vertical resolution to substantially below that of 
a well-designed progressive system having the same 
spot size - for a progressive system with a given k, an 
interlaced system having the same spot size had to have 
lower k. Many people have lumped this consideration 
into "Kell factor," but researchers such as Mitsuhashi 
identify this reduction separately as an interlace factor 
or interlace coefficient. 

Resolution 

SDTV (at roughly 720x480), HDTV at 1280x720, and 
HDTV at 1920x1080 all have different pixel counts. 
Image quality delivered by a particular number of pixels 
depends upon the nature of the image data (e.g., 
whether the data is raster-locked or Nyquist-filtered), 
and upon the nature of the display device (e.g., 
whether it has box or Gaussian reconstruction). 

In computing, unfortunately, the term resolution has 
come to refer simply to the count of vertical and hori- 
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Resolution properly refers to spatial 
phenomena. It is confusing to refer 
to a sample as having 8-bit “resolu- 
tion"; use precision or quantization. 




Figure 7.2 Resolution wedge 

pattern sweeps various hori- 
zontal frequencies through an 
imaging system. This pattern is 
calibrated in terms of cycles per 
picture height (here signified 
PH); however, with the pattern 
in the orientation shown, hori- 
zontal resolution is measured. 



zontal pixels in the pixel array, without regard for any 
overlap at capture, or overlap at display, that may have 
reduced the amount of detail in the image. A system 
may be described as having "resolution" of 1152x864 - 
this system has a total of about one million pixels (one 
megapixel, or 1 Mpx). Interpreted this way, "resolu- 
tion" doesn't depend upon whether individual pixels 
can be discerned ("resolved") on the face of the display. 

Resolution in a digital image system is bounded by the 
count of pixels across the image width and height. 
However, as picture detail increases in frequency, elec- 
tronic and optical effects cause response to diminish 
even within the bounds imposed by sampling. In video, 
we are concerned with resolution that is delivered to 
the viewer; we are also interested in limitations of 
bandwidth in capture, recording, processing, and 
display. In video, resolution concerns the maximum 
number of line pairs (or cycles) that can be resolved on 
the display screen. This is a subjective criterion ! Resolu- 
tion is related to perceived sharpness. 

Resolution is usually expressed in terms of spatial 
frequency, whose units are cycles per picture width 
(C/PW) horizontally, and cycles per picture height 
(C/PH) vertically, or units closely related to these. 

Figure 7.2 depicts a resolution test chart. In the orienta- 
tion presented, it sweeps across horizontal frequencies, 
and can be used to estimate horizontal resolution. 
Turned 90°, it can be used to sweep through vertical 
frequencies, and thereby estimate vertical resolution. 

Resolution in video 

Spatial phenomena at an image sensor or at a display 
device may limit both vertical and horizontal resolu- 
tion. However, analog processing, recording, and trans- 
mission in video limits bandwidth, and thereby affects 
only horizontal resolution. Resolution in consumer elec- 
tronics refers to horizontal resolution. Vertical re- 
sampling is now common in consumer equipment, and 
this potentially affects vertical resolution. In transform- 
based compression (such as JPEG, DV, and MPEG), 
dispersion comparable to overlap between pixels 
occurs; this affects horizontal and vertical resolution. 
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Figure 7.3 Vertical resolution concerns vertical 
frequency. This sketch shows image data whose 
power is concentrated at a vertical frequency of 
3 cycles per picture height (C/PH). 



Figure 7.4 Horizontal resolution concerns hori- 
zontal frequency. This sketch shows a horizontal 
frequency of 4 cycles per picture width (C/PW); 
at 4:3 aspect ratio, this is equivalent to 3 C/PH. 



Figure 7.5 Resolution in consumer television 

refers to horizontal resolution, expressed with 
reference to picture height (not width), and in 
units of vertical samples (scan lines, or pixels, not 
cycles). The resulting unit is TV lines per picture 
height - that is, TVL/PH, or "TV lines!' 

6 TVL/PH 
("6 lines") 

Figure 7.3 illustrates how vertical resolution is defined; 
Figures 7.4 and 7.5 show horizontal resolution. Confus- 
ingly, horizontal resolution is often expressed in units of 
"TV lines per picture height" Once the number of resolv- 
able lines is estimated, it must be corrected for the 
aspect ratio of the picture. In summary: 

Resolution in TVL/PH - colloquially, "TV lines" - is twice 
the horizontal resolution in cycles per picture width, 
divided by the aspect ratio of the picture. 

This definition enables the same test pattern calibra- 
tion scale to be used both vertically and horizontally. 

In analog video, the signal along each scan line is 
continuous; bandwidth places an upper bound on hori- 
zontal resolution. However, even in analog video, raster 
scanning samples the image in the vertical direction. 

The count of picture lines is fixed by a raster standard; 
the associated vertical sampling places an upper bound 
on vertical resolution. 

Vertical detail in an interlaced system is affected by 
both the Kell effect and an interlace effect. Historically, 
a Kell factor of about 0.7 and an interlace factor of 
about 0.7 applied, producing an overall factor of 0.5. 
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Figure 7.6 Vertical resolution in 
480/ systems can't quite reach 
the Nyquist limit of 240 cycles 
(line pairs), owing to Kell and 
interlace factors. Vertical resolu- 
tion is diminished, typically to 
7 / 10 of 240 - that is, to 1 66 C/PH 



Equivalent horizontal resolu- 
tion to 166 C/PH is obtained by 
multiplying by the 4:3 aspect 
ratio, obtaining 221 C/PW. 



Picture content consumes 
about 85% of the total line 
time. Dividing 221 C/PW by 
0.85 yields 260 cycles per total 
line. Line rate is 15.734 kHz; 
260 cycles during one complete 
line period corresponds to 
a video frequency of about 
4.2 MHz, the design point of 
NTSC. There are 79 "TV lines" 
per megahertz of bandwidth. 




Expressed in TV lines, 166 C/PH 
is multiplied by 2, to obtain 332. 



x 2 




x 2 x — ►- 332 TVL/PH 

4 ("332 lines") 




WW V 



260 C/total line 
= 4.2 MHz 



79 



TVL/PH 

MHz 



As a consequence, early interlaced systems showed no 
advantage in resolution over progressive systems of the 
same bandwidth. However, scan lines were much less 
visible in the interlaced systems. 

Figure 7.6 above summarizes how vertical and hori- 
zontal spatial frequency and bandwidth are related for 
480/ television. The image height is covered by 
480 picture lines. Sampling theory limits vertical image 
content to below 240 C/PH if aliasing is to be avoided. 
Reduced by Kell and interlace factors combining to 
a value of 0.7, about 166 C/PH of vertical resolution can 
be conveyed. At 4:3 aspect ratio, equivalent horizontal 
resolution corresponds to % times 166, or about 
= 260 221 C/PW. For a horizontal blanking overhead of 15%, 

that corresponds to about 260 cycles per total line 
time. At a line rate of 15.734 kHz, the video circuits 
should have a bandwidth of about 4.2 MHz. Repeating 
this calculation for 576/ yields 4.7 MHz. 
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Studio SDTV has 720 S AL \ 
resolution higher than 
540 TVL/PH is pointless. 



The NTSC, in 1941, was well aware of the Kell factor, 
and took it into account when setting the mono- 
chrome television standard with 525 total lines and 
about 480 picture lines. The numbers that I have 
quoted work out perfectly to achieve matched vertical 
and horizontal resolution, but there is no evidence that 
the NTSC performed quite this calculation. 



The relationship between bandwidth (measured in engi- 
neering units, MHz) and horizontal resolution 
(measured in consumer units, TVL/PH) depends upon 
blanking overhead and aspect ratio. For 480/ systems: 



1MHz 2 J_ Spw 
/h ' AR ' S TL 



= 79 



1MHz 
15.734 kHz 
TVL/PH 



■ 2 



MHz 



3 711 
4' 858 



Eq 7.1 



In 480/ video, there are 79 TVL/PH per megahertz of 
bandwidth. NTSC broadcast is limited to 4.2 MHz, so 
horizontal resolution is limited to 332 "TV lines." In 
576/ systems, there are 78 TVL/PH per megahertz of 
video. Most 625-line PAL broadcast systems have band- 
width roughly 20% higher than that of NTSC, so have 
correspondingly higher potential resolution. 



Viewing distance 

Pixel count in SDTV and HDTV is fixed by the corre- 
sponding scanning standards. In Viewing distance and 
angle, on page 8, I described how optimum viewing 
distance is where the scan-line pitch subtends an angle 
of about V 6 q°. If a sampled image is viewed closer than 
that distance, scan lines or pixels are liable to be visible. 
With typical displays, SDTV is suitable for viewing at 
about 7- PH; 1080/ HDTV is suitable for viewing at 
a much closer distance of about 3 ■ PH. 



A computer user tends to position himself or herself 
where scan-line pitch subtends an angle greater than 
Vgo° - perhaps at half that distance. However, at such 
a close distance, individual pixels are likely to be 
discernible, perhaps even objectionable, and the quality 
of continuous-tone images will almost certainly suffer. 
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Pixel count places a constraint on the closest viewing 
distance; however, visibility of pixel or scan-line struc- 
ture in an image depends upon many other factors such 
as sensor MTF, spot profile (PSF), and bandwidth. In 
principle, if any of these factors reduces the amount of 
detail in the image, the optimum viewing distance is 
pushed more distant. Plowever, consumers have formed 
an expectation that SDTV is best viewed at about 7 - PH ; 
when people become familiar with PIDTV they will form 
an expectation that it is best viewed at about 3 ■ PH . 

Bernie Lechner found, in unpublished research, that 
North American viewers tend to view SDTV receivers 
at about 9 ft. In similar experiments at Philips Labs in 
England, Jackson found a preference for 3 m. This 
viewing distance is sometimes called the Lechner 
distance - or in Europe, the Jackson distance ! These 
numbers are consistent with Equation 1.2, on page 8, 
applied to a 27-inch (70 cm) diagonal display. 

Rather than saying that improvements in bandwidth or 
spot profile enable decreased viewing distance, and 
therefore wider picture angle, we assume that viewing 
distance is fixed, and say that resolution is improved. 

Interlace revisited 

We can now revisit the parameters of interlaced scan- 
ning. At luminance and ambient illumination typical of 
television receivers, a vertical scan rate of 50 or 60 Hz is 
sufficient to overcome flicker. As I mentioned on 
page 56, at practical vertical scan rates, it is possible to 
flash alternate image rows in alternate vertical scans 
without causing flicker. This is interlace. The scheme is 
possible owing to the fact that temporal sensitivity of 
the visual system decreases at high spatial frequencies. 

Twitter is introduced, however, by vertical detail whose 
scale approaches the scan-line pitch. Twitter can be 
reduced to tolerable levels by reducing the vertical 
detail somewhat, to perhaps 0.7 times. On its own, this 
reduction in vertical detail would push the viewing 
distance back to 1.4 times that of progressive scanning. 
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Twitter and scan-line visibility are 
inversely proportional to the count 
of image rows, a one-dimensional 
quantity. However, sharpness is 
proportional to pixel count, a two- 
dimensional (areal) quantity. To 
overcome twitter at the same 
picture angle, 1 .4 times as many 
image rows are required; however, 
1.2 times as many rows and 1.2 
times as many columns are still 
available to improve picture angle. 



However, to maintain the same sharpness as a progres- 
sive system at a given data capacity, all else being 
equal, in interlaced scanning only half the picture data 
needs to be transmitted in each vertical scan period 
(field). For a given frame rate, this reduction in data per 
scan enables pixel count per frame to be doubled. 

The pixels gained could be exploited in one of three 
ways: By doubling the row count, by doubling the 
column count, or by distributing the additional pixels 
proportionally to image columns and rows. Taking the 
third approach, doubling the pixel count would increase 
column count by 1.4 and row count by 1.4, enabling 
a reduction of viewing distance to 0.7 of progressive 
scan. This would win back the lost viewing distance 
associated with twitter, and would yield equivalent 
performance to progressive scan. 

Ideally, though, the additional pixels owing to inter- 
laced scan should not be distributed proportionally to 
picture width and height. Instead, the count of image 
columns should be increased by about 1.7 (1.4x1. 2), 
and the count of image rows by about 1.2. The 1.4 
increase in the row count alleviates twitter; the factor 
of 1.2 increase in both row and column count yields 
a small improvement in viewing distance - and there- 
fore picture angle - over a progressive system. 

Interlaced scanning was chosen over progressive in the 
early days of television, half a century ago. All other 
things being equal - such as data rate, frame rate, spot 
size, and viewing distance - various advantages have 
been claimed for interlace scanning. 

If you neglect the introduction of twitter, and consider 
just the static pixel array, interlace offers twice the static 
resolution for a given bandwidth and frame rate. 

If you consider an interlaced image of the same size as 
a progressive image and viewed at the same distance - 
that is, preserving the picture angle - then there is 
a decrease in scan-line visibility. 
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Constant luminance 



8 



Video systems convey color image data using one 
component to represent lightness, and two other 
components to represent color, absent lightness. In 
Color science for video, on page 233, I will detail how 
luminance can be formed as a weighted sum of linear 
RGB values that are proportional to optical power. 
Transmitting relative luminance - preferably after impo- 
sition of a nonlinear transfer function - is called the 
Principle of Constant Luminance. 



The term luminance is widely 
misused in video. See Relative 
luminance, on page 206, and 
Appendix A, YUV and luminance 
considered harmful, on page 595. 



Video systems depart from this principle and imple- 
ment an engineering approximation. A weighted sum of 
linear RGB is not computed. Instead, a nonlinear 
transfer function is applied to each linear RGB compo- 
nent, then a weighted sum of the nonlinear gamma- 
corrected R'G'B' components forms what I call luma. 
(Many video engineers carelessly call this lumi- 
nance.) As far as a color scientist is concerned, 
a video system uses the theoretical matrix coefficients 
of color science but uses them in the wrong block 
diagram: In video, gamma correction is applied before 
the matrix, instead of the color scientist's preference, 
after. 



Applebaum, Sidney, “Gamma 
Correction in Constant Luminance 
Color Television Systems," in Proc. 
IRE, 40 (11): 1185-1195 
(Oct. 1952). 



In this chapter, I will explain why and how all video 
systems depart from the principle. If you are willing to 
accept this departure from theory as a fact, then you 
may safely skip this chapter, and proceed to Introduc- 
tion to luma and chroma, on page 87, where I will intro- 
duce how the luma and color difference signals are 
formed and subsampled. 
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The principle of constant luminance 

Ideally, the lightness component in color video would 
mimic a monochrome system: Relative luminance 
would be computed as a properly weighted sum of 
(linear-light) R, G, and B tristimulus values, according to 
the principles of color science that I will explain in 
Transformations between RGB and CIE XYZ, on 
page 251 . At the decoder, the inverse matrix would 
reconstruct the linear R, G, and B tristimulus values: 

Figure 8.1 Formation 

of relative luminance 





Y 11b 




[p] 





IP 1 ] 





R 

G 

B 



Two color difference (chroma) components would be 
computed, to enable chroma subsampling; these would 
be conveyed to the decoder through separate channels: 

Figure 8.2 Chroma 
components (linear) 




Set aside the chroma components for now: No matter 
how they are handled, all of the relative luminance is 
recoverable from the luminance channel. 

If relative luminance were conveyed directly, 11 bits or 
more would be necessary. Eight bits barely suffice if we 
use Nonlinear image coding, introduced on page 12, to 
impose perceptual uniformity: We could subject rela- 
tive luminance to a nonlinear transfer function that 
mimics vision's lightness sensitivity. Lightness can be 
approximated as CIE L* (to be detailed on page 208); 

L* is roughly the 0.4-power of relative luminance. 

Figure 8.3 Nonlinearly 

coded relative luminance 
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The inverse transfer function would be applied at 
decoding: 

Figure 8.4 Nonlinearly coded 
relative luminance 




If a video system were to operate in this manner, it 
would exhibit the Principle of Constant Luminance: 

All of the relative luminance would be present in, and 
recoverable from, a single component. 

Compensating the CRT 

Unfortunately for the theoretical block diagram - but 
fortunately for video, as you will see in a moment - the 
electron gun of a CRT monitor introduces a power func- 
tion having an exponent of approximately 2.5: 

Figure 8.5 CRT 

transfer function 




In a constant luminance system, the decoder would 
have to invert the monitor's power function. This would 
require insertion of a compensating transfer function - 
roughly a V2. 5 -power function - in front of the CRT: 

Figure 8.6 Compen- 
sating the CRT transfer 




The decoder would now include two power functions: 
An inverse L* function with an exponent close to 2.5 to 
undo the perceptually uniform coding, and a power 
function with an exponent of V 2 5 to compensate the 
CRT. Having two nonlinear transfer functions at every 
decoder would be expensive and impractical. Notice 
that the exponents of the power functions are 2.5 and 
V 2 5 - the functions are inverses! 
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Departure from constant luminance 

To avoid the complexity of incorporating two power 
functions into a decoder's electronics, we begin by rear- 
ranging the block diagram, to interchange the "order of 
operations" of the matrix and the CRT compensation: 

Figure 8.7 Rearranged decoder 




Upon rearrangement, the two power functions are adja- 
cent. Since the functions are effectively inverses, the 
combination of the two has no effect. Both functions 
can be dropped from the decoder: 

Figure 8.8 Simplified decoder 




The decoder now comprises simply the inverse of the 
encoder matrix, followed by the 2.5-power function 
that is intrinsic to the CRT. Rearranging the decoder 
requires that the encoder also be rearranged, so as to 
mirror the decoder and achieve correct end-to-end 
reproduction of the original RGB tristimulus values: 

Figure 8.9 Rearranged encoder 




Television engineers who are 
uneducated in color science often 
mistakenly call luma (V) by the 
name luminance and denote it by 
the unprimed symbol Y. This leads 
to great confusion, as I explain in 
Appendix A, on page 595. 



The rearranged flow diagram of Figure 8.9 is not mathe- 
matically equivalent to the arrangement of Figures 8.1 
through 8.4! The encoder's matrix no longer operates 
on (linear) tristimulus signals, and relative luminance is 
no longer computed. Instead, a nonlinear quantity Y', 
denoted luma, is computed and transmitted. Luma 
involves an engineering approximation: The system no 
longer adheres strictly to the Principle of Constant Lumi- 
nance (though it is often mistakenly claimed to do so). 
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Tristimulus values are correctly reproduced by the 
arrangement of Figure 8.9, and it is highly practical. 
Figure 8.9 encapsulates the basic signal flow for all 
video systems; it will be elaborated in later chapters. 

In the rearranged encoder, we no longer use CIE L* to 
optimize for perceptual uniformity. Instead, we use the 
inverse of the transfer function inherent in the CRT. 

A 0.4-power function accomplishes approximately 
perceptually uniform coding, and reproduces tristim- 
ulus values proportional to those in the original scene. 

You will learn in the following chapter, Rendering intent, 
that the 0.4 value must be altered to about 0.5 to 
accommodate a perceptual effect. This alteration 
depends upon viewing environment; display systems 
should have adjustments for rendering intent, but they 
don't! Before discussing the alteration, I will outline the 
repercussions of the nonideal block diagram. 

"Leakage" of luminance into chroma 

Until now, we have neglected the color difference 
components. In the rearranged block diagram of 
Figure 8.9 at the bottom of the facing page, color 
differences components are “matrixed" from nonlinear 
(gamma-corrected) R'G'B'\ 

Figure 8.10 Chroma components 




In a true constant luminance system, no matter how the 
color difference signals are handled, all of the relative 
luminance is carried by the luminance channel. In the 
rearranged system, most of the relative luminance is 
conveyed through the /'channel; however, some rela- 
tive luminance can be thought of as "leaking" into the 
color difference components. If the color difference 
components were not subsampled, this would present 
no problem. Flowever, the color difference components 
are formed to enable subsampling! So, we now turn our 
attention to that. 
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Figure 8.11 Subsampled 
chroma components 




Figure 8.12 Failure to adhere 
to constant luminance is 

evident in the dark band in 
the green-magenta transition 
of the colorbar test signal. 



In Figure 8.11 above, I show the practical block diagram 
of Figure 8.10, augmented with subsampling filters in 
the chroma paths. With nonconstant luminance coding, 
some of the relative luminance traverses the chroma 
pathways. Subsampling not only removes detail from 
the color components, it also removes detail from the 
"leaked" relative luminance. Consequently, relative 
luminance is incorrectly reproduced: In areas where 
luminance detail is present in saturated colors, relative 
luminance is reproduced too dark, and saturation is 
reduced. This is the penalty that must be paid for lack 
of strict adherence to the Principle of Constant Lumi- 
nance. These errors are perceptible by experts, but they 
are very rarely noticeable - let alone objectionable - in 
normal scenes. The departure from theory is apparent in 
the dark band appearing between the green and 
magenta color bars of the standard video test pattern, 
depicted in Figure 8.12 in the margin. 



To summarize signal encoding in video systems: First, 
a nonlinear transfer function, gamma correction, compa- 
rable to a square root, is applied to each of the linear R, 
G, and B tristimulus values to form R', G', and B’. Then, 
a suitably weighted sum of the nonlinear components is 
computed to form the luma signal (/')■ Luma approxi- 
mates the lightness response of vision. Color difference 
components blue minus luma ( B'-Y ') and red minus 
luma ( R'-Y ') are formed. (Luma, B'-Y', and R'-Y' can be 
The notation 4:2:2 has come to computed from R', G', and B' simultaneously, through 

denote not just chroma subsam- a 3x3 matrix.) The color difference components are 

pling but a whole set of sdtv then subsampled (filtered), using one of several 

schemes - including 4:2:2, 4:1 :1, and 4:2:0 - to be 
described starting on page 87. 
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Rendering intent 



9 



Giorgianni, Edward J., and 
T. E. Madden, Digital Color Manage- 
ment: Encoding Solutions (Reading, 
Mass.: Addison -Wes ley, 1998). 



I use the term white to refer to 
diffuse white, which I will explain on 
page 83. 



Examine the flowers in a garden at noon on a bright, 
sunny day. Look at the same garden half an hour after 
sunset. Physically, the spectra of the flowers have not 
changed, except by scaling to lower luminance levels. 
However, the flowers are markedly less colorful after 
sunset: Colorfulness decreases as luminance decreases. 

Reproduced images are usually viewed at a small frac- 
tion, perhaps Vioo orViooo, of the luminance at which 
they were captured. If reproduced luminance were 
made proportional to scene luminance, the reproduced 
image would appear less colorful, and lower in contrast, 
than the original scene. 

To reproduce contrast and colorfulness comparable to 
the original scene, we must alter the characteristics of 
the image. An engineer or physicist might strive to 
achieve mathematical linearity in an imaging system; 
however, the required alterations cause reproduced 
luminance to depart from linearity. The dilemma is this: 
We can achieve mathematical linearity, or we can 
achieve correct appearance, but we cannot simulta- 
neously do both! Successful commercial imaging 
systems sacrifice mathematics to achieve the correct 
perceptual result. 

If "white" in the viewing environment is markedly 
darker than "white" in the environment in which it was 
captured, the tone scale of an image must be altered. 
An additional reason for correction is the surround 
effect, which I will now explain. 
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Figure 9.1 Surround effect. 

The three squares surrounded 
by light gray are identical to 
the three squares surrounded 
by black; however, each of the 
black-surround squares is 
apparently lighter than its 
counterpart. Also, the contrast 
of the black-surround series 
appears lower than that of the 
white-surround series. 



DeMarsh, LeRoy E., and Edward 
J. Giorgianni, “Color Science for 
Imaging Systems," in Physics 
Today, Sept. 1989 , 44-52. 




Image-related scattered light 
is called flare. 



Simultaneous contrast has 
another meaning, where it is 
a contraction oh simultaneous 
contrast ratio (distinguished 
from sequential contrast ratio). 
See Contrast ratio, on page 197. 



Surround effect 

Human vision adapts to an extremely wide range of 
viewing conditions, as I will detail in Adaptation, on 
page 196. One of the mechanisms involved in adapta- 
tion increases our sensitivity to small brightness varia- 
tions when the area of interest is surrounded by bright 
elements. Intuitively, light from a bright surround can 
be thought of as spilling or scattering into all areas of 
our vision, including the area of interest, reducing its 
apparent contrast. Loosely speaking, the visual system 
compensates for this effect by "stretching" its contrast 
range to increase the visibility of dark elements in the 
presence of a bright surround. Conversely, when the 
region of interest is surrounded by relative darkness, 
the contrast range of the vision system decreases: Our 
ability to discern dark elements in the scene decreases. 
The effect is demonstrated in Figure 9.1 above, from 
DeMarsh and Giorgianni. The surround effect stems 
from the perceptual phenomenon called the simulta- 
neous contrast effect, also known as lateral inhibition. 

The surround effect has implications for the display of 
images in dark areas, such as projection of movies in 
a cinema, projection of 35 mm slides, or viewing of 
television in your living room. If an image were repro- 
duced with the correct relative luminance, then when 
viewed in a dark or dim surround, it would appear 
lacking in contrast. 
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Image reproduction is not simply concerned with 
physics, mathematics, chemistry, and electronics: 
Perceptual considerations play an essential role. 

Tone scale alteration 

Tone scale alteration is necessary mainly for the two 
reasons that I have described: The luminance of a 
reproduction is typically dramatically lower than the 
luminance of the original scene, and the surround of a 
reproduced image is rarely comparable to the surround 
of the original scene. Two additional reasons contribute 
to the requirement for tone scale alteration: limitation 
of contrast ratio, and specular highlights. 

Simultaneous contrast ratio is the 
ratio of luminances of the lightest 
and darkest elements of a scene 
(or an image). For details, see 
Contrast ratio, on page 197. 



Diffuse white refers to the luminance of a diffusely 
reflecting white surface in a scene. Paper reflects 
diffusely, and white paper reflects about 90% of inci- 
dent light, so a white card approximates diffuse white. 
However, most scenes contain shiny objects that reflect 
directionally. When viewed in certain directions, these 
objects reflect specular highlights having luminances 
perhaps ten times that of diffuse white. At the repro- 
duction device, we can seldom afford to reproduce 
diffuse white at merely 10% of the maximum lumi- 
nance of the display, solely to exactly reproduce the 
luminance levels of the highlights! Nor is there any 
need to reproduce highlights exactly: A convincing 
image can be formed with highlight luminance greatly 
reduced from its true value. To make effective use of 
luminance ranges that are typically available in image 
display systems, highlights must be compressed. 

Incorporation of rendering intent 

The correction that I have mentioned is achieved by 
subjecting luminance - or, in the case of a color system, 
tristimulus values - to an end-to-end power function 
having an exponent between about 1.1 and 1.6. The 



An original scene typically has a ratio of luminance 
levels - a simultaneous contrast ratio - of 1000:1 or 
more. However, contrast ratio in the captured image is 
limited by optical flare in the camera. Contrast ratio at 
the display is likely to be limited even further - by phys- 
ical factors, and by display flare -to perhaps 100:1. 
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exponent depends primarily upon the ratio of scene 
luminance to reproduction luminance. The exponent 
depends to some degree upon the display physics and 
the viewing environment. Nearly all image reproduc- 
tion systems require some tone scale alteration. 

In Constant luminance, on page 75, I outlined consider- 
ations of nonlinear coding in video. Continuing the 
sequence of sketches from Figure 8.9, on page 78, 
Figure 9.2 shows that correction for typical television 
Figure 9.2 Imposition of viewing could be effected by including, in the decoder, 

rendering at decoder a power function having an exponent of about 1.25 : 




Observe that a power function is already a necessary 
part of the encoder. Instead of altering the decoder, we 
modify the encoder's power function to approximate a 
0.5-power, instead of the physically correct 0.4-power: 




Figure 9.3 Imposition of 
rendering at encoder 




REPRODUCTION 
TRISTIMULUS 
VALUES, FOR DIM 
SURROUND 




SCENE 

TRISTIMULUS 

VALUES 



Concatenating the 0.5-power at encoding and the 
2.5-power at decoding produces the end-to-end 
1.25-power required for television display in a dim 
surround. To recover scene tristimulus values, the 
encoding transfer function should simply be inverted; 
the decoding function then approximates a 2.0-power 
function, as sketched at the bottom right of Figure 9.3. 



As I mentioned in the marginal note on page 26, 
depending upon the setting of the brightness control, 
the effective power function exponent at a CRT varies 
from its nominal 2.5 value. In a dark viewing environ- 
ment - such as a home theater - the display's bright- 
ness setting will be reduced; the decoder's effective 
exponent will rise to about 2.7, and the end-to-end 
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Imaging system 


Encoding 

exponent 


"Advertised" 

exponent 


Decoding 

exponent 


Typ. 

Surround 


End-to-end 

exponent 


Cinema 


0.6 


0.6 


2.5 


Dark 


1.5 


Television (Rec. 709, see 
page 263) 


0.5 


0.45 


2.5 


Dim 


1.25 


Office (sRGB, see page 267) 


0.45 


0.42 


2.5 


Light 


1.125 



Table 9.1 End-to-end power functions for several imaging systems. The encoding exponent 
achieves approximately perceptual coding. (The "advertised" exponent neglects the scaling and 
offset associated with the straight-line segment of encoding.) The decoding exponent acts at the 
display to approximately invert the perceptual encoding. The product of the two exponents sets 
the end-to-end power function that imposes the rendering intent. 



power will rise to about 1.5. In a bright surround - such 
as a computer in a desktop environment - brightness 
will be increased; this will reduce the effective expo- 
nent to about 2.3, and thereby reduce the end-to-end 
exponent to about 1.125. 

The encoding exponent, decoding exponent, and end- 
to-end power function for cinema, television, and office 
CRT viewing are shown in Table 9.1 above. 



Fairchild, Mark D., Color Appear- 
ance Models (Reading, Mass.: 
Addison-Wesley, 1998). 

James, T. H., ed., The Theory of the 
Photographic Process, Fourth Edition 
(Rochester, N.Y.: Eastman Kodak, 
1977). See Ch. 19 (p. 537), 

Preferred Tone Reproduction. 



In film systems, the necessary correction is designed 
into the transfer function of the film (or films). Color 
reversal (slide) film is intended for viewing in a dark 
surround; it is designed to have a gamma considerably 
greater than unity - about 1.5 - so that the contrast 
range of the scene is expanded upon display. In cinema 
film, the correction is achieved through a combination 
of the transfer function ("gamma" of about 0.6) built 
into camera negative film and the overall transfer func- 
tion ("gamma" of about 2.5) built into print film. 



Some people suggest that NTSC 
should be gamma-corrected with 
power of V2.2, and PAL with 1/2.8- 
I disagree with both interpretations; 
see page 268. 



I have described video systems as if they use a pure 
0.5-power law encoding function. Practical consider- 
ations necessitate modification of the pure power func- 
tion by the insertion of a linear segment near black, as 
I will explain in Gamma, on page 257. The exponent in 
the Rec. 709 standard is written ("advertised") as 0.45; 
however, the insertion of the linear segment, and the 
offsetting and scaling of the pure power function 
segment of the curve, cause an exponent of about 0.51 
to best describe the overall curve. (To describe gamma 
as 0.45 in this situation is misleading.) 
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In the sRGB standard, the 
exponent is written (“adver- 
tised") as V2.4 (about 0.417). 
However, the insertion of the 
linear segment, and the offset- 
ting and scaling of the pure 
power function segment of the 
curve, cause an exponent of 
about 0.45 to best describe the 
overall curve. See sRGB transfer 
function, on page 267. 



Rendering intent in desktop computing 

In the desktop computer environment, the ambient 
condition is considerably brighter, and the surround is 
brighter, than is typical of television viewing. An end- 
to-end exponent lower than the 1.25 of video is called 
for; a value around 1.125 is generally suitable. However, 
desktop computers are used in a variety of different 
viewing conditions. It is not practical to originate every 
image in several forms, optimized for several potential 
viewing conditions! A specific encoding function needs 
to be chosen. Achieving optimum reproduction in 
diverse viewing conditions requires selecting a suitable 
correction at display time. Technically, this is easy to 
achieve: Modern computer display subsystems have 
hardware lookup tables (LUTs) that can be loaded 
dynamically with appropriate curves. However, it is 
a challenge to train users to make a suitable choice. 

In the development of the sRGB standard for desktop 
computing, the inevitability of local, viewing-depen- 
dent correction was not appreciated. That standard 
promulgates an encoding standard with an effective 
exponent of about 0.45, different from that of video. 
We are now saddled with image data encoded with two 
standards having comparable perceptual uniformity but 
different rendering intents. Today, sRGB and video 
(Rec. 709) coding are distinguished by the applications: 
sRGB is used for still images, and Rec. 709 coding is 
used for motion video images. But image data types are 
converging, and this dichotomy in rendering intent is 
bound to become a nuisance. 



Video cameras, film cameras, motion picture cameras, 
and digital still cameras all capture images from the real 
world. When an image of an original scene or object is 
captured, it is important to introduce rendering intent. 
However, scanners used in desktop computing rarely 
scan original objects; they usually scan reproductions 
such as photographic prints or offset-printed images. 
When a reproduction is scanned, rendering intent has 
already been imposed by the first imaging process. It 
may be sensible to adjust the original rendering intent, 
but it is not sensible to introduce rendering intent that 
would be suitable for scanning a real scene or object. 
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Video systems convey image data in the form of one 
component that represents lightness, and two compo- 
nents that represent color, disregarding lightness. This 
scheme exploits the reduced color acuity of vision 
compared to luminance acuity: As long as lightness is 
conveyed with full detail, detail in the color compo- 
nents can be reduced by subsampling (filtering, or aver- 
aging). This chapter introduces the concepts of luma 
and chroma encoding; details will be presented in Luma 
and color differences, on page 281 . 

Luma 

A certain amount of noise is inevitable in any image 
digitizing system. As explained in Nonlinear image 
coding, on page 12, we arrange things so that noise has 
a perceptually similar effect across the entire tone scale 
from black to white. The lightness component is 
conveyed in a perceptually uniform manner that mini- 
mizes the amount of noise (or quantization error) intro- 
duced in processing, recording, and transmission. 

Ideally, noise would be minimized by forming a signal 
proportional to CIE luminance, as a suitably weighted 
sum of linear R, G, and B tristimulus signals. Then, this 
signal would be subjected to a transfer function that 
imposes perceptual uniformity, such as the CIE L* func- 
tion of color science that will be detailed on page 208. 
As explained in Constant luminance, on page 75, there 
are practical reasons in video to perform these opera- 
tions in the opposite order. First, a nonlinear transfer 
function - gamma correction - is applied to each of the 



87 



The prime symbols here, and 
in following equations, denote 
nonlinear components. 


linear R, G, and B tristimulus signals: We impose the 
Rec. 709 transfer function, very similar to a square root, 
and roughly comparable to the CIE lightness ( L *) func- 
tion. Then a weighted sum of the resulting nonlinear R', 
G', and B' components is computed to form a luma 
signal (V") representative of lightness. SDTV uses coeffi- 
cients that are standardized in Rec. 601 (see page 97): 

60 V'= 0.299 /T + 0.587 (7 + 0.114 B' Eq 10.1 

Unfortunately, luma for HDTV is coded differently from 
luma in SDTV! Rec. 709 specifies these coefficients: 


Luma is coded differently in 
large (HDTV) pictures than in 
small (SDTV) pictures! 


70 V= 0.2126 /T + 0.7152C+0.0722 B‘ E 9 10 - 2 


CIE: Commission Internationale 
de I'Eclairage 


Sloppy use of the term luminance 

The term luminance and the symbol Y were established 
by the CIE, the standards body for color science. Unfor- 
tunately, in video, the term luminance has come to 
mean the video signal representative of luminance even 
though the components of the video signal have been 
subjected to a nonlinear transfer function. At the dawn 
of video, the nonlinear signal was denoted V", where 
the prime symbol indicated the nonlinear treatment. 
But over the last 40 years the prime has not appeared 
consistently; now, both the term luminance and the 
symbol Y conflict with their CIE definitions, making 
them ambiguous! This has led to great confusion, such 
as the incorrect statement commonly found in 
computer graphics textbooks and digital image- 
processing textbooks that in the YIQ or YUV color 
spaces, the Y component is identical to CIE luminance! 


See Appendix A, YUV and luminance 
considered harmful, on page 595. 


1 use the term luminance according to its CIE definition; 

1 use the term luma to refer to the video signal; and 1 
am careful to designate nonlinear quantities with 
a prime. However, many video engineers, computer 
graphics practitioners, and image-processing specialists 
use these terms carelessly. You must be careful to deter- 
mine whether a linear or nonlinear interpretation is 
being applied to the word and the symbol. 
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Luma and color differences can be 
computed from R', C, and S' 
through a 3x3 matrix multiplication. 



Y'P B P R 



Y'C b C r 



Y'UV 



Y'lQ 
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Color difference coding (chroma) 

In component video, three components necessary to 
convey color information are transmitted separately. 
Rather than conveying R'G'B' directly, the relatively 
poor color acuity of vision is exploited to reduce data 
capacity accorded to the color information, while main- 
taining full luma detail. First, luma is formed according 
to Marginal note (or for FIDTV, Marginal note ). Then, 
two color difference signals based upon gamma- 
corrected B' minus luma and R' minus luma, B'-Y' and 
R'-Y', are formed by "matrixing." Finally, subsampling 
(filtering) reduces detail in the color difference (or 
chroma) components, as I will outline on page 93. 
Subsampling incurs no loss in sharpness at any reason- 
able viewing distance. 

In component analog video, B'-Y' and R'-Y' are scaled 
to form color difference signals denoted P B and P R , 
which are then analog lowpass filtered (horizontally) to 
about half the luma bandwidth. 

In component digital video, M-JPEG, and MPEG, B'-Y' 
and R'-Y' are scaled to form C B and C R components, 
which can then be subsampled by a scheme such as 
4:2:2 or 4:2:0, which I will describe in a moment. 

In composite NTSC or PAL video, B’-Y' and R'-Y' are 
scaled to form U and 1/ components. Subsequently, U 
and V are lowpass filtered, then combined into 
a modulated chroma component, C. Luma is then 
summed with modulated chroma to produce the 
composite NTSC or PAL signal. Scaling of U and V is 
arranged so that the excursion of the composite signal 
(Y'+C) is constrained to the range -V 3 to +% of the 
unity excursion of luma. U and 1 / components have no 
place in component analog or component digital video. 

Composite NTSC video was standardized in 1953 based 
upon / and Q components that were essentially U and V 
components rotated 33° and axis-exchanged. It was 
intended that excess detail would be removed from the 
Q component so as to improve color quality. The 
scheme never achieved significant deployment in 
receivers, and / and Q components are now obsolete. 

INTRODUCTION TO LUMA AND CHROMA 89 



R'G'B' 4:4:4 


Y'C B Cr 


4:4:4 


4:2:2 

(Rec. 601) 


4:1:1 

(480/ DV25; D-7) 


4:2:0 (JPEG/JFIF, 
H.261 , MPEG-1) 
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Figure 10.1 Chroma subsampling. A 2x2 array of R'G'B' pixels is matrixed into a luma compo- 
nent Y' and two color difference components C B and C R . Color detail is reduced by subsampling C B 
and C R ; providing full luma detail is maintained, no degradation is perceptible. In this sketch, 
samples are shaded to indicate their spatial position and extent. In 4:2:2, in 4:1:1, and in 4:2:0 
used in MPEG-2, C B and C R are cosited (positioned horizontally coincident with a luma sample). In 
4:2:0 used in JPEG/JFIF, FI. 261, and MPEG-1, C B and C R are sited interstitially (midway between 
luma samples). 



Chroma subsampling 

4:4:4 In Figure 10.1 above, the left-hand column sketches 

a 2x2 array of R'G'B' pixels. Prior to subsampling, this 
is denoted 4:4:4 R'G'B'. With 8 bits per sample, this 
2x2 array of R'G'B' would occupy a total of 12 bytes. 
Each R'G'B’ triplet (pixel) can be transformed 
("matrixed") into V"C B C R , as shown in the second 
column; this is denoted 4:4:4 V"C B C R . 

In component digital video, data capacity is reduced by 
subsampling C B and C R using one of three schemes. 

4:2:2 /'C B C R studio digital video according to Rec. 601 uses 

4:2:2 sampling: C B and C R components are each 
subsampled by a factor of 2 horizontally. C B and C R are 
sampled together, coincident ( cosited ) with even- 
numbered luma samples. The 12 bytes of R'G'B' are 
reduced to 8, effecting 1.5:1 lossy compression. 

4:1:1 Certain digital video systems, such as 480/29.97 DV25, 

use 4:1:1 sampling, whereby C B and C R components are 
each subsampled by a factor of 4 horizontally, and 
cosited with every fourth luma sample. The 12 bytes of 
R'G'B’ are reduced to 6, effecting 2:1 compression. 
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4:2:0 



ITU-T Rec. H.261, known casu- 
ally as px 64 (“p times 64"), is 
a videoconferencing standard. 



This scheme is used in JPEG/JFIF, H.261 , MPEG-1, 
MPEG-2, and consumer 576/25 DVC. C B and C R are 
each subsampled by a factor of 2 horizontally and 
a factor of 2 vertically. The 12 bytes of R'G'B' are 
reduced to 6. C B and C R are effectively centered verti- 
cally halfway between image rows. There are two vari- 
ants of 4:2:0, having different horizontal siting. In 
MPEG-2, C B and C R are cosited horizontally. In 
JPEG/JFIF, H.261, and MPEG-1, C B and C R are sited 
interstitially, halfway between alternate luma samples. 

Figure 10.2 overleaf summarizes the various schemes. 



Subsampling effects 1.5:1 or 2:1 lossy compression. 
However, in studio terminology, subsampled video is 
referred to as uncompressed: The word compression is 
reserved for JPEG, M-JPEG, MPEG, or other techniques. 



The use of 4 as the numerical 
basis for subsampling notation is 
a historical reference to sampling 
at roughly four times the NTSC 
color subcarrier frequency. The 
4/ sc rate was already in use for 
composite digital video. 



Figure 10.3 Chroma subsam- 
pling notation indicates, in the 
first digit, the luma horizontal 
sampling reference. The second 
digit specifies the horizontal 
subsampling ofC B and C R with 
respect to luma. The third digit 
originally specified the hori- 
zontal subsampling of C R . The 
notation developed without 
anticipating vertical subsam- 
pling; a third digit of zero now 
denotes 2:1 vertical subsam- 
pling of both C B and C R . 



Chroma subsampling notation 

At the outset of digital video, subsampling notation was 
logical; unfortunately, technology outgrew the nota- 
tion. In Figure 10.3 below, I strive to clarify today's 
nomenclature. The first digit originally specified luma 
sample rate relative to 3% MHz. HDTV was once 
supposed to be described as 22:11:11 ! The leading 
digit has, thankfully, come to be relative to the sample 
rate in use. Until recently, the initial digit was always 4, 
since all chroma ratios have been powers of two - 4, 2, 
or 1. However, 3:1:1 subsampling has recently been 
commercialized in an HDTV production system (Sony's 
HDCAM), and in the SDL mode of consumer DV (see 
page 468), so 3 may now appear as the leading digit. 




Luma horizontal sampling reference 
(originally, luma/ s as multiple of 3 % MHz) 



C B and C R horizontal factor 
(relative to first digit) 



Same as second digit; 
or zero, indicating Cg and C R 
are subsampled 2:1 vertically 



4:2:2:4 



If present, same as 
luma digit; indicates 
alpha (key) component 
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Figure 10.2 Subsampling schemes are 

summarized here. C indicates a [C B , C R ] 
sample pair when located at the same 
site; otherwise (as in the DV schemes) 
individual C B and C R notations indicate 
the centers of the respective chroma 
samples. V" indicates the center of a luma 
sample. The schemes in the left column 
are progressive. The schemes in the right 
column are interlaced; there, black letters 
indicate top field samples and gray letters 
indicate bottom field samples. 
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Figure 10.4 Interstitial chroma 
filter for JPEC/JFIF averages 
samples over a 2x2 block. 
Shading represents the spatial 
extent of luma samples. The 
black dot indicates the effective 
subsampled chroma position, 
equidistant from the four luma 
samples. The outline represents 
the spatial extent of the result. 




Figure 10.5 Cosited chroma 
filter for Rec. 601, 4:2:2 

causes each filtered chroma 
sample to be positioned 
coincident - cosited - with an 
even-numbered luma sample. 



V V 1/ 

'8 '4 '8 

V V V 

'8 4 '8 



Figure 10.6 Cosited chroma 
filter for MPEG-2, 4:2:0 

produces a filtered result 
sample that is cosited horizon- 
tally, but sited interstitially in 
the vertical dimension. 
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Chroma subsampling filters 

In chroma subsampling, the encoder discards selected 
color difference samples after filtering. A decoder 
approximates the missing samples by interpolation. 

To perform 4:2:0 subsampling with minimum computa- 
tion, some systems simply average C B over a 2x2 block, 
and average C R over the same 2x2 block, as sketched in 
Figure 10.4 in the margin. To interpolate the missing 
chroma samples prior to conversion back to R'G’B’, 
low-end systems simply replicate the subsampled C B 
and C R values throughout the 2x2 quad. This tech- 
nique is ubiquitous in JPEG/JFIF stillframes in 
computing, and is used in M-JPEG, FI .261, and 
MPEG-1. This simple averaging process causes subsam- 
pled chroma to take an effective horizontal position 
halfway between two luma samples, what I call intersti- 
tial siting, not the cosited position standardized for 
studio video. 

A simple way to perform 4:2:2 subsampling with hori- 
zontal cositing as required by Rec. 601 is to use weights 
of [1/ 4 , V 2 , V 4 ], as sketched in Figure 10.5. 4:2:2 
subsampling has the advantage of no interaction with 
interlaced scanning. 

A cosited horizontal filter can be combined with 
[V 2 , V 2 ] vertical averaging, as sketched in Figure 10.6, 
to implement 4:2:0 as used in MPEG-2. 

Simple averaging filters like those of Figures 10.4, 10.5, 
and 10.6 have acceptable performance for stillframes, 
where any alias components that are generated remain 
stationary, or for desktop-quality video. Plowever, in 
a moving image, an alias component introduced by 
poor filtering is liable to move at a rate different from 
the associated scene elements, and thereby produce 
a highly objectionable artifact. Pligh-end digital video 
equipment uses sophisticated subsampling filters, 
where the subsampled C B and C R of a 2x1 pair in 4:2:2 
(or of a 2x2 quad in 4:2:0) take contributions from 
several surrounding samples. The relationship of filter 
weights, frequency response, and filter performance will 
be detailed in Filtering and sampling, on page 141 . 
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The video literature often calls 
these quantities chrominance. That 
term has a specific meaning in 
color science, so in video I prefer 
the term modulated chroma. 



See Introduction to composite NTSC 
and PAL, on page 103. Concerning 
SECAM, see page 576. 



Chroma in composite NTSC and PAL 

I introduced the color difference components P B P R and 
C B C R , often called chroma components. They accom- 
pany luma in a component video system. I also intro- 
duced UV and IQ components; these are intermediate 
quantities in the formation of modulated chroma. 

Historically, insufficient channel capacity was available 
to transmit three color components separately. The 
NTSC technique was devised to combine the three 
color components into a single composite signal; the 
PAL technique is both a refinement of NTSC and an 
adaptation of NTSC to 576/ scanning. (In SECAM, the 
three color components are also combined into one 
signal. SECAM is a form of composite video, but the 
technique has little in common with NTSC and PAL, 
and it is of little commercial importance today.) 

Encoders traditionally started with R'G'B' components. 
Modern analog encoders usually start with V"P B P R 
components; digital encoders (sometimes called a 4:2:2 
to 4/ sc converters) usually start with V"C B C R compo- 
nents. NTSC or PAL encoding involves these steps: 

Component signals are matrixed and conditioned to 
form color difference signals U and 1/ (or / and Q). 

U and 1/ (or / and Q) are lowpass-filtered, then quadra- 
ture modulation imposes the two color difference 
signals onto an unmodulated color subcarrier, to 
produce a modulated chroma signal, C. 

Luma and chroma are summed. In studio video, 
summation exploits the frequency-interleaving principle. 

Composite NTSC and PAL signals were historically 
analog; nowadays, they can be digital (4/ sc ), though as 
I mentioned in Video system taxonomy, on page 62, 
composite video is being rapidly supplanted by compo- 
nent video in the studio, in consumers' premises, and in 
industrial applications. For further information, see 
Introduction to composite NTSC and PAL, on page 103. 
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The notation CCIR is often 
wrongly used to denote 576/25 
scanning. The former CCIR (now 
ITU-R) standardized many scan- 
ning systems, not just 576/25. 



In Raster scanning, on page 51, I introduced the 
concepts of raster scanning; in Introduction to luma and 
chroma, on page 87, I introduced the concepts of color 
coding in video. This chapter combines the concepts of 
raster scanning and color coding to form the basic tech- 
nical parameters of 480/ and 576/ systems. This 
chapter concerns modern systems that use component 
color - V"C B C R (Rec. 601), or T'P B P R . In Introduction to 
composite NTSC and PAL, on page 103, I will describe 
NTSC and PAL composite video encoding. 

Scanning standards 

Two scanning standards are in use for conventional 
analog television broadcasting in different parts of the 
world. The 480/29.97 system is used primarily in North 
America and Japan, and today accounts for roughly V 4 
of all television receivers. The 576/25 system is used 
primarily in Europe, Asia, Australia, Korea, and Central 
America, and accounts for roughly 3 /4 of all television 
receivers. 480/29.97 (or 525/59.94/2:1) is colloquially 
referred to as NTSC, and 576/25 (or 625/50/2:1) as 
PAL ; however, the terms NTSC and PAL properly apply 
to color encoding and not to scanning standards. It is 
obvious from the scanning nomenclature that the line 
counts and field rates differ between the two systems: 

In 480/29.97 video, the field rate is exactly 60 /i.ooi Hz; 
in 576/25, the field rate is exactly 50 Hz. 



Several different standards for 480/29.97 and 576/25 
digital video are sketched in Figure 11.1 overleaf. 
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Figure 11.1 SDTV digital video rasters for 4:3 aspect ratio. 480/29.97 scanning is at the left, 
576/25 at the right. The top row shows square sampling ("square pixels"). The middle row shows 
sampling at the Rec. 601 standard sampling frequency of 13.5 MHz. The bottom row shows 
sampling at four times the color subcarrier frequency (4 fsc>- Above each diagram is its count of 
samples per total line (S TL ); ratios among S TL values are written vertically in bold numerals. 



Monochrome systems having 
405/50 and 819/50 scanning 
were once used in Britain and 
France, respectively, but transmit- 
ters for these systems have now 
been decommissioned. 



See PAL-M, PAL-N on page 575, 
and SECAM on page 576. 
Consumer frustration with a diver- 
sity of functionally equivalent stan- 
dards has led to proliferation of 
multistandard TVs and VCRs in 
countries using these standards. 



Analog broadcast of 480/ usually uses NTSC color 
coding with a color subcarrier of about 3.58 MHz; 
analog broadcast of 5 76/ usually uses PAL color coding 
with a color subcarrier of about 4.43 MHz. It is impor- 
tant to use a notation that distinguishes scanning from 
color, because other combinations of scanning and 
color coding are in use in large and important regions of 
the world. Brazil uses PAL-M, which has 480/ scanning 
and PAL color coding. Argentina uses PAL-N, which has 
576/ scanning and a 3.58 MHz color subcarrier nearly 
identical to NTSC's subcarrier. In France, Russia, and 
other countries, SECAM is used. Production equipment 
is no longer manufactured for any of these obscure 
standards: Production in these countries is done using 
480/ or 5 76/ studio equipment, either in the compo- 
nent domain or in 480/ NTSC or 576/ PAL. These studio 
signals are then transcoded prior to broadcast: The color 
encoding is altered - for example, from PAL to 
SECAM - without altering scanning. 
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Figure 1 1 .2 SDTV sample rates are shown for six different 4:3 standards, along with the usual 
color coding for each standard. There is no realtime studio interface standard for square-sampled 
SDTV. The D-7 and D-2 designations properly apply to videotape formats. 



Figure 11.1 indicates 5 tl and 5 AL for each standard. The 
S AL values are the result of some complicated issues to 
be discussed in Choice of S AL and S PW parameters on 
page 325. For details concerning my reference to 483 
active lines (L A ) in 480/ systems, see Picture lines, on 
page 324. 



ITU-R Rec. BT.601-5, Studio 
encoding parameters of digital tele- 
vision for standard 4:3 and wide- 
screen 1 6:9 aspect ratios. 



Figure 11.2 above shows the standard 480/29.97 and 
576/25 digital video sampling rates, and the color 
coding usually associated with each of these standards. 
The 4:2:2, V"C B C R system for SDTV is standardized in 
Recommendation BT. 601 of the ITU Radiocommunica- 
tion Sector (formerly CCIR). I call it Rec. 601. 



With one exception, all of the sampling systems in 
Figure 1 1.2 have a whole number of samples per total 
line; these systems are line-locked. The exception is 
composite 4/ sc PAL sampling, which has a noninteger 
number (1 1 35 1 / 625 ) of samples per total line; this 
creates a huge nuisance for the system designer. 
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System 


480i29.97 


576i25 


+ The EBU N10 component 


Picture:sync ratio 


10:4+ 


7:3 


analog interface for Y'P B P R , 
occasionally used for 480/, has 


Setup, percent 


7.5+ 


0 


7:3 picture-to-sync ratio. 

+ 480/ video in Japan, and the 


Count of equalization, 
broad pulses 


6 


5 


EBU N10 component analog 


Line number 1, and 


First 


First broad pulse 


interface, have zero setup. See 


0 V , defined at: 


equalization 


of frame 


page 327. 


Bottom picture line in: 


pulse of field 
First field 


Second field 



Table 11.1 Gratuitous differences between 480/ and 576/ 



480/ and 576/ have gratuitous differences in many tech- 
nical parameters, as summarized in Table 11.1 above. 



Figures 11.3, 11.4, and 11.5 depict 
just the image array (i.e., the active 
samples), without vertical blanking 
lines. MPEG makes no provision for 
halflines. 



Different treatment of interlace between 480/ and 576/ 
imposes different structure onto the picture data. The 
differences cause headaches in systems such as MPEG 
that are designed to accommodate both 480/ and 576/ 
images. In Figures 11.3 and 11.4 below, I show how 
field order, interlace nomenclature, and image struc- 
ture are related. Figure 11.5 at the bottom of this page 
shows how MPEG-2 identifies each field as either top or 
bottom. In 480/ video, the bottom field is the first field 
of the frame; in 576/, the top field is first. 



EVEN 



ODD 






Figure 11.3 Interlacing in 480/. The first field (historically 
called odd, here denoted 1) starts with a full picture line, and 
ends with a left-hand halfline containing the bottom of the 
picture. The second field (here dashed, historically called even), 
transmitted about Veo s later, starts with a right-hand halfline 
containing the top of the picture; it ends with a full picture line 



® — ONE Figure 1 1 .4 Interlacing in 576/. The first field includes a right- 

, hand halfline containing the top line of the picture, and ends 
— — — — with a full picture line. The second field, transmitted V50 s 

_ _ _ _ later, starts with a full line, and ends with a left-hand halfline 

* that contains the bottom of the picture. (In 576/ terminology, 

• the terms odd and even are rarely used, and are best avoided.) 



TOP Figure 11.5 Interlacing in MPEG-2 identifies a picture 
. according to whether it contains the top or bottom picture line 
of the frame. Top and bottom fields are displayed in the order 
that they are coded in an MPEG-2 data stream. For frame- 
coded pictures, display order is determined by a one-bit flag 
• top field first, typically asserted for 576/ and negated for 480/. 
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576/25 WIDESCREEN 
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Figure 11.6 Widescreen SDTV sampling uses the standard 
13.5 MHz sampling rate, effectively stretching samples hori- 
zontally by V 3 compared to the 4:3 aspect ratio base standard. 



Widescreen (16:9) SDTV 

Television programming has historically been produced 
in 4:3 aspect ratio. However, wide aspect ratio 
programming - originated on film, HDTV, or wide- 
screen SDTV - is now economically important. Also, 
there is increasing consumer interest in widescreen 
programming. Consumers dislike the blank areas of the 
display that result from letterboxing. Consequently, 
SDTV standards are being adapted to handle 16:9 
aspect ratio. Techniques to accomplish this are known 
as widescreen SDTV. That term is misleading, though: 
Because there is no increase in pixel count, a so-called 
widescreen SDTV picture cannot be viewed with 
a picture angle substantially wider than regular (4:3) 
SDTV. (See page 43.) So widescreen SDTV does not 
deliver HDTV's major promise - that of dramatically 
wider viewing angle - and a more accurate term would 
be wide aspect ratio SDTV. 



The technique of Figure 11.6 is 
used on many widescreen DVDs. 
A DVD player can be configured 
to subsample vertically by a factor 
of 3 / 4 , to letterbox such a recorded 
image for 4:3 display. (Some DVDs 
are recorded letterboxed.) 



The latest revision (-5) of Rec. 601 standardizes an 
approach to widescreen SDTV sketched in Figure 11.6 
above. The standard 13.5 MHz luma sampling rate for 
480/ or 576/ component video is used, but for an 
image at 16:9 aspect ratio. Each sample is stretched 
horizontally by a ratio of % compared to the 4:3 aspect 
ratio of video. Existing 480/ or 576/ component video 
infrastructure can be used directly. (Some camcorders 
can be equipped with anamorphic lenses to produce 
this form of widescreen SDTV through optical means.) 

A second approach, not sketched here, uses a higher 
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sampling rate of 18 MHz (i.e., % times 13.5 MHz). This 
scheme offers somewhat increased pixel count 
compared to 4:3 systems; however, it is rarely used. 

Progressive SDTV (480/>/483/>) 

A progressive 483/?59.94 studio standard has been 
established in SMPTE 293M, with parameters similar to 
Rec. 601, but without interlace and with twice the data 
rate. Some people consider 483p to provide high defi- 
nition. Unquestionably, 483p has higher quality than 
480/, but I cannot characterize 483 p as HDTV. Japan's 
EDTV-II broadcast system is based upon 483 p scan- 
ning. Provisions are made for 480/? in the ATSC stan- 
dards for digital television. One major U.S. network has 
broadcast in 480p29.97, one of the ATSC formats. 



Frame 0 Frame ' 
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Figure 11.7 Chroma subsam- 
pling in 4:2:0 p alternates 
frame-to-frame in a two- 
frame sequence, even though 
scanning is progressive. 



480p and 483p systems have either 4:2:2 or 4:2:0 
chroma subsampling. The 4:2:2p variant is a straightfor- 
ward extension of Rec. 601 subsampling to progressive 
scanning. The 4:2:0 variant differs from 4:2:0 used in 
JPEG/JFIF, and differs from 4:2:0 used in MPEG-2. This 
scheme is denoted 4:2:0 p. Unfortunately, this notation 
appears to follow the naming convention of MPEG-2's 
4:2:2 profile (denoted 422P); however, in 4:2:0 p, the p 
is for progressive, not profile! 

Figure 1 1.7 depicts 4:2:0p chroma subsampling used in 
483 p. Although frames are progressive, chroma 
subsampling is not identical in every frame. Frames are 
denoted 0 and 1 in an alternating sequence. Chroma 
samples in frame 0 are positioned vertically coincident 
with even-numbered image rows; chroma samples in 
frame 1 are cosited with odd-numbered image rows. 
Compare this sketch with Figure 10.1, on page 90. 



Quasi-interlace in consumer SDTV is 
comparable to progressive 
segmented-frame (PsF) in HDTV, 
though at 25 or 29.97 frames per 
second instead of 24. See page 62. 



Some recent cameras implement a progressive mode - 
in DVC camcorders, sometimes called movie mode, or 
frame mode - whereby images are captured at 
480/?29.97 (720x480) or 576/?25 (720x576). The DV 
compression algorithm detects no motion between the 
fields, so compression effectively operates on progres- 
sive frames. Interlace is imposed at the analog inter- 
face; this is sometimes called quasi-interlace. Excellent 
stillframes result; however, motion portrayal suffers. 
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Square and nonsquare sampling 

Computer graphics equipment usually employs square 
sampling - that is, a sampling lattice where pixels are 
equally spaced horizontally and vertically. Square 
sampling of 480/ and 5 76/ is diagrammed in the top 
rows of Figures 11.1 and 11.2 on page 97. 



See Table 13.1 , on page 114, and 
the associated discussion. 



f 

648 <= 780 • 1 

V 



10.7 ps 
63.555 ps 



767 = 944- 



52 ps 
64 ps 



Although ATSC's notorious Table 3 includes a 640x480 
square-sampled image, no studio standard or realtime 
interface standard addresses square sampling of SDTV. 
For desktop video applications, I recommend sampling 
480/ video with exactly 780 samples per total line, for 
a nominal sample rate of 12 3 /n MFIz - that is, 
12.272727 MFIz. To accommodate full picture width in 
the studio, 648 samples are required; often, 640 
samples are used with 480 picturelines. For square 
sampling of 576/ video, I recommend using exactly 944 
samples per total line, for a sample rate of exactly 
14.75 MHz. 



/s, 60 i 540000 

4/sc.PAL-I " 709379 



MPEG-1, MPEG-2, DVD, and DVC all conform to 
Rec. 601, which specifies nonsquare sampling. Rec. 601 
sampling of 480/ and 576/ is diagrammed in the middle 
rows of Figures 1 1.1 and 1 1.2. 

Composite digital video systems sample at four times 
the color subcarrier frequency (4/ sc ), resulting in 
nonsquare sampling whose parameters are shown in the 
bottom rows of Figures 11.1 and 11.2. (As I stated on 
page 94, composite 4/ sc systems are in decline.) 

In 480/, the sampling rates for square-sampling, 

Rec. 601, and 4/ sc are related by the ratio 30:33:35. 
The pixel aspect ratio of Rec. 601 480/ is exactly 10 /n ; 
the pixel aspect ratio of 4/ sc 480/ is exactly 6 /z. 

In 5 76/ , the sampling rates for square sampling and 
4:2:2 are related by the ratio 59:54, so the pixel aspect 
ratio of 576/ Rec. 601 is precisely 59 / 54 . Rec. 601 and 
4/ sc sample rates are related by the ratio in the margin, 
which is fairly impenetrable to digital hardware. 



Most of this nonsquare sampling business has been put 
behind us: HDTV studio standards call for square 



CHAPTER 11 



INTRODUCTION TO COMPONENT SDTV 



101 



sampling, and it is difficult to imagine any future studio 
standard being established with nonsquare sampling. 

Resampling 

Analog video can be digitized with square sampling 
simply by using an appropriate sample frequency. 
However, SDTV already digitized at a standard digital 
video sampling rate such as 13.5 MHz must be resam- 
pled - or interpolated, or in PC parlance, scaled - when 
entering the square-sampled desktop video domain. If 
video samples at 13.5 MHz are passed to a computer 
graphics system and then treated as if the samples are 
equally spaced vertically and horizontally, then picture 
geometry will be distorted. Rec. 601 480/ video will 
appear horizontally stretched; Rec. 601 576/ video will 
appear squished. In desktop video, often resampling in 
both axes is needed. 

The ratio ' l0 /-n relates 480/ Rec. 601 to square 
sampling: Crude resampling could be accomplished by 
simply dropping every eleventh sample across each scan 
line! Crude resampling from 576/ Rec. 601 to square 
sampling could be accomplished by replicating 
5 samples in every 54 (perhaps in the pattern 
1 1 -R - 1 1 -RA 1 -RA 1 -RA 0 -R, where R denotes 
a repeated sample). However, such sample dropping 
and stuffing techniques will introduce aliasing. 

I recommend that you use a more sophisticated inter- 
polator, of the type explained in Filtering and sampling, 
on page 141 . Resampling could potentially be 
performed along either the vertical axis or the hori- 
zontal (transverse) axis; horizontal resampling is the 
easier of the two, as it processes pixels in raster order 
and therefore does not require any linestores. 
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NTSC stands for National Televi- 
sion System Committee. PAL stands 
for Phase Alternate Line (or 
according to some sources, Phase 
Alternation at Line rate, or perhaps 
even Phase Alternating Line). 

SECAM is a composite technique of 
sorts, though it has little in common 
with NTSC and PAL. See page 576. 



In component video, the three color components are 
kept separate. Video can use R'G'B' components 
directly, but three signals are expensive to record, 
process, or transmit. Luma (V") and color difference 
components based upon B'-Y' and R'-Y' can be used to 
enable subsampling: Luma is maintained at full data 
rate, and the two color difference components are 
subsampled. Even subsampled, video has a fairly high 
information rate (bandwidth, or data rate). To reduce 
the information rate further, composite NTSC and PAL 
color coding uses quadrature modulation to combine 
two color difference components into a modulated 
chroma signal, then uses frequency interleaving to 
combine luma and modulated chroma into a composite 
signal having roughly V3 the data rate - or in an analog 
system, V3 the bandwidth - of R'G'B'. 



Composite encoding was invented to address three 
main needs. First, there was a need to limit transmis- 
sion bandwidth. Second, it was necessary to enable 
black-and-white receivers already deployed by 1953 to 
receive color broadcasts with minimal degradation. 
Third, it was necessary for newly introduced color 
receivers to receive standard black-and-white broad- 
casts. Composite encoding was necessary in the early 
days of television, and it has proven highly effective for 
broadcast. NTSC and PAL are used in billions of 
consumer electronic devices, and broadcasting of NTSC 
and PAL is entrenched. 
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By NTSC and PAL, I do not 
mean 480/ and 576/, or 
525/59.94 and 625/50! 



When I use the term PAL in this 
chapter, I refer only to 576/ 
PAL-B/G/H/I. Variants of PAL 
used for broadcasting in South 
America are discussed in Analog 
NTSC and PAL broadcast stan- 
dards, on page 571 . PAL vari- 
ants in consumer devices are 
discussed in Consumer analog 
NTSC and PAL, on page 579. 



Composite NTSC or PAL encoding has three major 
disadvantages. First, encoding introduces some degree 
of mutual interference between luma and chroma. 

Once a signal has been encoded into composite form, 
the NTSC or PAL footprint is imposed: Cross-luma and 
cross-color errors are irreversibly impressed on the 
signal. Second, it is impossible to directly perform many 
processing operations in the composite domain; even to 
reposition or resize a picture requires decoding, 
processing, and reencoding. Third, digital compression 
techniques such as JPEG and MPEG cannot be directly 
applied to composite signals, and the artifacts of NTSC 
and PAL encoding are destructive to MPEG encoding. 

The bandwidth to carry separate color components is 
now easily affordable; composite encoding is no longer 
necessary in the studio. To avoid the NTSC and PAL 
artifacts, to facilitate image manipulation, and to enable 
compression, composite video has been superseded by 
component video, where three color components R'G'B', 
or V"C B C R (in digital systems), or Y'P B P R (in analog 
systems), are kept separate. I hope you can manage to 
avoid composite NTSC and PAL, and skip this chapter! 

The terms NTSC and PAL properly denote color 
encoding standards. Unfortunately, they are often used 
incorrectly to denote scanning standards. PAL encoding 
is used with both 5 76/ scanning (with two different 
subcarrier frequencies) and 480/ scanning (with a third 
subcarrier frequency); PAL alone is ambiguous. 

In principle, NTSC or PAL color coding could be used 
with any scanning standard. However, in practice, NTSC 
and PAL are used only with 480/ and 576/ scanning, 
and the parameters of NTSC and PAL encoding are opti- 
mized for those scanning systems. This chapter intro- 
duces composite encoding. Three later chapters detail 
the principles: NTSC and PAL chroma modulation, on 
page 335; NTSC and PAL frequency interleaving, on 
page 349; and NTSC Y'lQ system, on page 365. Studio 
standards are detailed in 480i NTSC composite video, on 
page 511, and 576/ PAL composite video, on page 529. 
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NTSC and PAL encoding 

NTSC or PAL encoding involves these steps: 

• R'G'B' component signals are matrixed and filtered, or 
V"C B C R or Y'P b P r components are scaled and filtered, to 
form luma (V") and color difference signals ( U and V, or 
in certain NTSC systems, / and Q). 

• U and V (or / and Q) color difference signals are modu- 
lated onto a pair of intimately related continuous-wave 
color subcarriers, typically at a frequency of about 
3.58 MHz in 480/29.97 or 4.43 MHz in 576/25, to 
produce a modulated chroma signal, C. (See the left 
side of Figure 12.1 overleaf.) 

• Luma and modulated chroma are summed to form 

a composite NTSC or PAL signal. (See the right side of 
Figure 12.1 .) Summation of luma and chroma is liable to 
introduce a certain degree of mutual interference, 
called cross-luma and cross-color; these artifacts can be 
minimized through frequency interleaving, to be 
described. 

The S-video interface bypasses the third step. The 
S-video interface transmits luma and modulated chroma 
separately: They are not summed, so cross-luma and 
cross-color artifacts are avoided. 

NTSC and PAL decoding 

NTSC or PAL decoding involves these steps: 

• Luma and modulated chroma are separated. Crude 
separation can be accomplished using a notch filter. 
Alternatively, frequency interleaving can be exploited to 
provide greatly improved separation; in NTSC, such 

a separator is a comb filter. (In an S-video interface, 
luma and modulated chroma are already separate.) 

• Chroma is demodulated to produce UV, IQ, P B P R , or 
C B C R baseband color difference components. 

• If R'G'B' components are required, the baseband color 
difference components are interpolated, then luma and 
the color difference components are dematrixed. 
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Figure 12.1 NTSC chroma modulation and frequency interleaving are applied, successively, to 
encode luma and a pair of color difference components into NTSC composite video. First, the two 
color difference signals are modulated onto a color subcarrier. If the two color differences are inter- 
preted in polar coordinates, hue angle is encoded as subcarrier phase, and saturation is encoded as 
subcarrier amplitude. (Burst, a sample of the unmodulated subcarrier, is included in the composite 
signal.) Then, modulated chroma is summed with luma. Frequency interleaving leads to line-by- 
line phase inversion of the unmodulated color subcarrier, thence to the modulated subcarrier. 
Summation of adjacent lines tends to cause modulated chroma to cancel, and luma to average. 
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Figure 12.2 S-video 
interface involves 
chroma modulation; 
however, luma and 
modulated chroma 
traverse separate paths 
across the interface, 
instead of being summed. 
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S-video interface 

S-video involves NTSC or PAL chroma modulation; 
however, luma and modulated chroma traverse sepa- 
rate paths across the interface instead of being 
summed. Figure 12.2 above sketches the encoder and 
decoder arrangement. S-video is common in consumer 
and desktop video equipment, but is rare in the studio, 
where either component or composite video is gener- 
ally used. 

Frequency interleaving 

When luma and modulated chroma are summed, 
a certain amount of mutual interference is introduced. 
Interference is minimized by arranging for frequency 
interleaving, which is achieved when the color subcar- 
rier frequency and the line rate are coherent - that is, 
when the unmodulated color subcarrier is phase-locked 
to a carefully chosen rational multiple of the line rate - 
half the line rate for NTSC, and V 4 the line rate in PAL. 
Coherence is achieved in the studio by deriving both 
sync and color subcarrier from a single master clock. 



In PAL, all but the most sophisti- 
cated comb filters separate U and V, 
not luma and chroma. See page 341 . 



In NTSC, frequency interleaving enables use of a comb 
filter to separate luma and chroma: Adjacent lines are 
summed (to form vertically averaged luma) and differ- 
enced (to form vertically averaged chroma), as 
suggested at the bottom right of Figure 12.1 . 
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In industrial and consumer video, subcarrier often free- 
runs with respect to line rate, and the advantages of 
frequency interleaving are lost. Most forms of analog 
videotape recording introduce timebase error; left 
uncorrected, this also defeats frequency interleaving. 



910 



525 




Figure 12.3 480/, 4/ sc NTSC 
sampling is line-locked. If the 
analog sync edge were to be 
digitized, it would take the 
same set of values every line. 



Composite digital SDTV (4/sc) 

Processing of digital composite signals is simplified if 
the sampling frequency is a small integer multiple of the 
color subcarrier frequency. Nowadays, a multiple of four 
is used: It is standard to sample a composite NTSC or 
PAL signal at four-times-subcarrier, or 4 f sc (pronounced 
four eff ess see.) 

In 4 /sc NTSC systems sampling rate is about 14.3 MHz. 
Because NTSC's subcarrier is a simple rational multiple 
( 455 / 2 ) of line rate, sampling is line-locked. In line- 
locked sampling, every line has the same integer 
number of sample periods. In 4 f sc NTSC, each line has 
910 sample periods (S TL ), as indicated in Figure 12.3. 



1135 4 / 625 



625 




Figure 12.4 576/, 4/ sc PAL 
sampling is not line-locked. 



In conventional 576/ PAL-B/G/H/I systems, the 4/ sc 
sampling rate is about 1 7.7 MHz. Owing to the complex 
relationship in "mathematical PAL" between subcarrier 
frequency and line rate, sampling in PAL is not line- 
locked: There is a noninteger number (1135%25) of 
sample periods per total line, as indicated in 
Figure 12.4 in the margin. (In Europe, they say that 
"Sampling is not precisely orthogonal") 



if you had to give 4/ sc a designation During the development of early studio digital stan- 
akin to 4:2:2, you might call it 4:0:0. dards, the disadvantages of composite video processing 

and recording were widely recognized. The earliest 
component digital video standard was Rec. 601, 
adopted in 1984; it specified a component video inter- 
face with 4:2:2 chroma subsampling and a sampling 
rate of 13.5 MHz, as I described in the previous 
chapter. Eight-bit sampling of Rec. 601 has a raw data 
rate of 27 MB/s. The first commercial DVTRs were stan- 
dardized by SMPTE under the designation D-7. (In 
studio video terminology, chroma subsampling is not 
considered to be compression.) 
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Concerning the absence of D-4 in 
the numbering sequence, see the 
caption to Table 35.2, on page 423. 



Cable television is detailed in 
Ciciora, Walter, James Farmer, and 
David Large, Modern Cable Televi- 
sion Technology (San Francisco: 
Morgan Kaufmann, 1999). 
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Eight-bit sampling of NTSC at 4 f sc has a data rate of 
about 14.3 MB/s, roughly half that of 4:2:2 sampling. In 
1988, four years after the adoption of the D-1 standard, 
Ampex and Sony commercialized 4/ sc composite digital 
recording to enable a cheap DVTR. This was standard- 
ized by SMPTE as D-2. (Despite its higher number, the 
format is in most ways technically inferior to D-1.) 
Several years later, Panasonic adapted D-2 technology 
to V^-inch tape in a cassette almost the same size as 
a VHS cassette; this became the D-3 standard. 

D-2 and D-3 DVTRs offered the advantages of digital 
recording, but retained the disadvantages of composite 
NTSC or PAL: Luma and chroma were subject to cross- 
contamination, and pictures could not be manipulated 
without decoding and reencoding. 

D-2 and D-3 DVTRs were deployed by broadcasters, 
where composite encoding was inherent in terrestrial 
broadcasting standards. However, for high-end produc- 
tion work, D-1 remained dominant. In 1994, Panasonic 
introduced the D-5 DVTR, which records a 10-bit 
Rec. 601, 4:2:2 signal on V 2 -inch tape. Recently, VTRs 
using compression have proliferated. 

Composite analog SDTV 

Composite analog 480/ NTSC and 5 76/ PAL have been 
used for terrestrial VHF/UHF broadcasting and cable 
television for many decades. I will describe Analog NTSC 
and PAL broadcast standards on page 571 . 

Composite analog 480/ NTSC and 576/ PAL is widely 
deployed in consumer equipment, such as television 
receivers and VCRs. Some degenerate forms of NTSC 
and PAL are used in consumer electronic devices; see 
Consumer analog NTSC and PAL, on page 579. 
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Fujio, T., J. Ishida, T. Komoto, and 
T. Nishizawa, High Definition 
Television System - Signal Standards 
and Transmission, NHK Science and 
Technical Research Laboratories 
Technical Note 239 (Aug. 1979); 
reprinted in SMPTE Journal, 89 (8): 
579-584 (Aug. 1980). 

Fujio, T., et al. , High Definition 
television, NFIK Science and Tech- 
nical Research Laboratories Tech- 
nical Monograph 32 (June 1982). 



Developmental FIDTV systems had 
1125/60.00/2:1 scanning, an 
aspect ratio of 5:3, and 1035 
active lines. The alternate 59.94 FHz 
field rate was added later. Aspect 
ratio was changed to 16:9 to 
achieve international agreement 
upon standards. Active line count 
of 1080 was eventually agreed 
upon to provide square sampling. 



This chapter outlines the 1280x720 and 1920x1080 
image formats for high-definition television (HDTV), 
and introduces the scanning parameters of the associ- 
ated video systems such as 720p60 and 1080/30. 

Today's HDTV systems stem from research directed by 
Dr. Fujio at NHK (Nippon Hoso Kyokai, the Japan 
Broadcasting Corporation). HDTV was conceived to 
have twice the vertical and twice the horizontal resolu- 
tion of conventional television, a picture aspect ratio of 
5:3 (later altered to 16:9), and at least two channels of 
CD-quality audio. Today we can augment this by speci- 
fying a frame rate of 23.976 Hz or higher. Some people 
consider 480p systems to be HDTV, but by my defini- 
tion, HDTV has 3 / 4 -million pixels or more. NHK 
conceived HDTV to have interlaced scanning; however, 
progressive HDTV systems have emerged. 

Studio HDTV has a sampling rate of 74.25 MHz, 

5.5 times that of the Rec. 601 standard for SDTV. HDTV 
has a pixel rate of about 60 megapixels per second. 
Other parameters are similar or identical to SDTV stan- 
dards. Details concerning scanning, sample rates, and 
interface levels of HDTV will be presented in 1280x720 
HDTV on page 547 and 1920x1080 HDTV on page 557. 
Unfortunately, the parameters for V"C B C R color coding 
for HDTV differ from the parameters for SDTV! Details 
will be provided in Component video color coding for 
HDTV, on page 313. 



Ill 



Figure 13.1 Comparison 
of aspect ratios between 
conventional television 
and HDTV was attempted 
using various measures: 
equal height, equal width, 
equal diagonal, and equal 
area. All of these compari- 
sons overlooked the 
fundamental improvement 
of HDTV: its increased 
pixel count. The correct 
comparison is based upon 
equal picture detail. 



4:3 Aspect ratio 16:9 Aspect ratio 

4 5.33 



Equal 

Height 



4 



2.25 



Equal 

Width 




3 



4 



4 



3 



4.36 




Equal 

Diagonal 



Equal 

Area 



6.75 



Equal 

Detail! 



Comparison of aspect ratios 

When HDTV was introduced to the consumer elec- 
tronics industry in North America, SDTV and HDTV 
were compared using various measures, sketched in 
Figure 13.1 above, based upon the difference in aspect 
ratio between 4:3 and 16:9. Comparisons were made 
on the basis of equal height, equal width, equal diag- 
onal, and equal area. 

All of those measures overlooked the fundamental 
improvement of HDTV: Its "high definition" - that is, its 
resolution - does not squeeze six times the number of 
pixels into the same visual angle! Instead, the angular 
subtense of a single pixel should be maintained, and 
the entire image may now occupy a much larger area of 
the viewer's visual field. HDTV allows a greatly 
increased picture angle. The correct comparison 
between conventional television and HDTV is not based 
upon aspect ratio; it is based upon picture detail. 
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Figure 13.2 HDTV rasters at 30 and 60 frames per second are standardized in two formats, 
1280x720 (1 Mpx, always progressive), and 1920x1080 (2 Mpx, interlaced or progressive). The 
latter is often denoted 1080/, but the standards accommodate progressive scan. These sketches are 
scaled to match Figures 11.1, 11.2, and 11.6; pixels in all of these sketches have identical area. 



HDTV scanning 

A great debate took place in the 1980s and 1990s 
concerning whether HDTV should have interlaced or 
progressive scanning. At given flicker and data rates, 
interlace offers some increase in static spatial resolu- 
tion, as suggested by Figure 6.8 on page 59. Broad- 
casters have historically accepted the motion artifacts 
and spatial aliasing that accompany interlace, in order 
to gain some static spatial resolution. In the HDTV 
debate, the computer industry and the creative film 
community were set against interlace. Eventually, both 
interlaced and progressive scanning were standardized; 
to be commercially viable, a receiver must decode both 
formats. 



In Numerology of HDTV scanning, 
on page 377, I explain the origin 
of the numbers in Figure 13.2. 



Figure 13.2 above sketches the rasters of the 1 Mpx 
progressive system (1280x720, 720p60) and the 2 Mpx 
interlaced system (1920x1080, 1080/30) that were 
agreed upon. The 1920x1080 system is easily adapted 
to 24 and 30 Hz progressive scan (1080/24, 1080/30). 
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4:3 


Square 
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30 











4 Frame rates modified by the ratio 1000 /iooi - that is, frame 
rates of 23.976 Hz, 29.97 Hz, and 59.94 Hz - are 
permitted. 



Table 13.1 ATSC A/53 Table 3 defines the so-called 18 formats - including 12 SDTV formats - for 
digital television in the U.S. I find the layout of ATSC's Table 3 to be hopelessly contorted, so 
I rearranged it. ATSC specifies 704 S AL for several SDTV formats, instead of Rec. 601 's 720 S AL ; see 
page 325. ATSC standard A/53 doesn't accommodate 25 Hz and 50 Hz frame rates, but A/63 does. 

atsc a/53, Digital Television In addition to the 1 Mpx (progressive) and 2 Mpx 

standard. (interlaced) systems, several SDTV scanning systems and 

several additional frame rates and were included in the 
ultimate ATSC standards for U.S. digital television 
(DTV). Table 13.1 above summarizes the "18 formats" 
that are found in Table 3 of the ATSC's A/53 standard. 

Figure 13.2 sketched the 1920x1080 image format for 
frame rates of 30 Hz and 60 Hz. This image format can 
be carried at frame rates of 24 Hz and 25 Hz, using the 
standard 74.25 MHz sample rate. Figure 13.3 at the top 
of the facing page sketches raster structures for 24 Hz 
and 25 Hz systems; Table 13.2 overleaf summarizes the 
scanning parameters. 

To carry a 1920x1080 image at a frame rate of 25 Hz, 
two approaches have been standardized. One approach 
is standardized in SMPTE 274M: 1125 total lines are 
retained, and S TL is increased to 2640. This yields the 
1080p25 format, using an 1125/25 raster. Scanning can 
be either progressive or interlaced; with progressive 
scanning, the signal is usually interfaced using the 
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2200 




Figure 13.3 HDTV rasters at 24 Hz and 25 Hz carry an array of 1920x1080 active samples, using 
a 74.25 MHz sampling rate at the interface. For 24 Hz (1080p24), the 1920x1080 array is carried 
in an 1125/24 raster. For 25 Hz, the array is carried in an 1125/25 raster. 



progressive segmented frame (PsF) scheme that 
I introduced on page 62. 

Some European video engineers dislike 1125 lines, so in 
addition to the approach sketched in Figure 13.3 an 
alternative approach is standardized in SMPTE 295 M: 
The 1920x1080 image is placed in a 1250/25/2:1 raster 
with 2376 S TL . I recommend against this approach: 
Systems with 1125 total lines are now the mainstream. 

For 24 Hz, 1125 total lines are retained, and S TL iS 
increased to 2750 achieve the 24 Hz frame rate. This 
yields the 1080/?24 format, in an 1125/24 raster. This 
system is used in emerging digital cinema (D-cinema) 

~ 23.976 products. A variant at 23.976 Hz is accommodated. 

In Sony's PIDCAM system, the 1920x1080 image is 
downsampled to 1440x1080, and color differences are 
subsampled 3:1 :1, prior to compression. This is an 
internal representation only; there is no corresponding 
uncompressed external interface standard. 
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Table 13.2 HDTV scanning parameters are summarized. The 1035/30 system, flagged with 4 
above, is not recommended for new designs; use 1080/30 instead. SMPTE 274 M includes 
a progressive 2 Mpx, 1080p60 system with 1125/60/1:1 scanning, flagged with H above; this 
system is beyond the limits of today's technology. Each of the 24, 30, and 60 Hz systems above has 
an associated system at 1000 /iooi of that rate. 



Table 13.2 summarizes the scanning parameters for 
720 p, 1080/, and 1080p systems. Studio interfaces for 
HDTV will be introduced in Digital video interfaces, on 
page 127. HDTV videotape recording standards will be 
introduced in Videotape recording, on page 411. 

The 1035/ (1125/60) system 

The SMPTE 240M standard for 1 125/60.00/2:1 HDTV 
was adopted in 1988. The 1125/60 system, now called 
1035/30, had 1920x1035 image structure with 
nonsquare sampling: Pixels were 4% closer horizontally 
than vertically. After several years, square sampling was 
introduced into the SMPTE standards, and subse- 
quently, into ATSC standards. 1920x1035 image struc- 
ture has been superseded by 1920x1080, and square 
sampling is now a feature of all HDTV studio standards. 

Color coding for Rec. 709 HDTV 

Rec. 709 defines V"C B C R color coding. Unfortunately, 
the luma coefficients standardized in Rec. 709 - and 
the C B C R scale factors derived from them - differ from 
those of SDTV. /'C B C R coding now comes in two 
flavors: coding for small (SDTV) pictures, and coding for 
large (HDTV) pictures. I will present details concerning 
this troublesome issue in SDTV and HDTV luma chaos, 
on page 296. 
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Introduction to 

video compression 14 



Directly storing or transmitting V"C B C R digital video 
requires immense data capacity - about 20 megabytes 
per second for SDTV, or about 120 megabytes per 
second for HDTV. First-generation studio digital VTRs, 
and today's highest-quality studio VTRs, store uncom- 
pressed video; however, economical storage or trans- 
mission requires compression. This chapter introduces 
the JPEG, M-JPEG, and MPEG compression techniques. 

Data compression 

Data compression reduces the number of bits required 
to store or convey text, numeric, binary, image, sound, 
or other data, by exploiting statistical properties of the 
data. The reduction comes at the expense of some 
computational effort to compress and decompress. 

Data compression is, by definition, lossless: Decompres- 
sion recovers exactly, bit for bit (or byte for byte), the 
data that was presented to the compressor. 

Binary data typical of general computer applications 
often has patterns of repeating byte strings and 
substrings. Most data compression techniques, 
including run-length encoding (RLE) and Lempel-Ziv- 
Welch (LZW), accomplish compression by taking advan- 
tage of repeated substrings; performance is highly 
dependent upon the data being compressed. 

Image compression 

Image data typically has strong vertical and horizontal 
correlations among pixels. When the RLE and LZW 
algorithms are applied to bilevel or pseudocolor image 
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data stored in scan-line order, horizontal correlation 
among pixels is exploited to some degree, and usually 
results in modest compression (perhaps 2:1). 

A data compression algorithm can be designed to 
exploit the statistics of image data, as opposed to arbi- 
trary binary data; improved compression is then 
possible. For example, the ITU-T (former CCITT) fax 
standard for bilevel image data exploits vertical and 
horizontal correlation to achieve much higher average 
compression ratios than are possible with RLE or LZW. 

Transform techniques are effective for the compression 
of continuous-tone (grayscale or truecolor) image data. 
The discrete cosine transform (DCT) has been developed 
and optimized over the last few decades; it is now the 
method of choice for continuous-tone compression. 

Lossy compression 

Data compression is lossless, by definition: The decom- 
pression operation reproduces, bit-for-bit, the data 
presented to the compressor. In principle, lossless data 
compression could be optimized to achieve modest 
compression of continuous-tone (grayscale or truecolor) 
image data. However, the characteristics of human 
perception can be exploited to achieve dramatically 
higher compression ratios if the requirement of exact 
reconstruction is relaxed: Image or sound data can be 
subject to lossy compression, provided that the impair- 
ments introduced are not overly perceptible. Lossy 
compression schemes are not appropriate for bilevel or 
pseudocolor images, but they are very effective for 
grayscale or truecolor images. 

JPEG refers to a lossy compression method for still 
images. Its variant M-JPEG is used for motion 
sequences; DVC equipment uses an M-JPEG algorithm. 
MPEG refers to a lossy compression standard for video 
sequences; MPEG-2 is used in digital television distri- 
bution (e.g., ATSC and DVB), and in DVD. I will 
describe these techniques in subsequent sections. 

Table 14.1 at the top of the facing page compares 
typical compression ratios of M-JPEG and MPEG-2, for 
SDTV and HDTV. 
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Format 


Uncompressed data 
rate, MB/s 


Motion-JPEG 
compression ratio 


MPEG-2 
compression ratio 


SDTV 


20 


15:1 


45:1 


(480/30, 576/25) 




(e.g., DVC) 


(e.g,. DVD) 


HDTV 

(720p60, 1080/30) 


120 


15:1 


75:1 
(e.g., ATSC) 



Table 14.1 Approximate compression ratios of M-JPEG and MPEG-2 for SDTV and HDTV 



JPEG 

jpeg stands for Joint Photo- The JPEG committee developed a standard suitable for 

graphic Experts Croup, constituted compressing grayscale or truecolor still images. The 

with ITU-T (the former CCITT). standard was originally intended for color fax, but it 

was quickly adopted and widely deployed for still 
images in desktop graphics and digital photography. 

A JPEG compressor ordinarily transforms R'G'B' to 
V"C B C R , then applies 4:2:0 chroma subsampling to effect 
2:1 compression. (In desktop graphics, this 2:1 factor is 
included in the compression ratio.) JPEG has provisions 
to compress R'G'B’ data directly, without subsampling. 

Motion -JPEG 

The JPEG algorithm - though not the ISO/I EC JPEG 
standard - has been adapted to compress motion 
video. Motion-JPEG simply compresses each field or 
frame of a video sequence as a self-contained 
compressed picture - each field or frame is intra coded. 
Because pictures are compressed individually, an 
M-JPEG video sequence can be edited; however, no 
advantage is taken of temporal coherence. 

Video data is almost always presented to an M-JPEG 
compression system in V"C B C R subsampled form. (In 
video, the 2:1 factor due to chroma subsampling is 
generally not included in the compression ratio.) 

The M-JPEG technique achieves compression ratios 
ranging from about 2:1 to about 20:1. The 20 MB/s data 
rate of digital video can be compressed to about 
20 Mb/s, suitable for recording on consumer digital 
videotape (e.g., DVC). M-JPEG compression ratios and 
tape formats are summarized in Table 14.2 overleaf. 
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Compression ratio 


Quality/application 


Example tape formats 


2:1 


"Visually lossless" 
studio video 


Digital Betacam 


3.3:1 


Excellent-quality studio video 


DVCPRO50, D-9 (Digital-S) 


6.6:1 


Good-quality studio video; 
consumer digital video 


D-7 (DVCPRO), DVCAM, consumer DVC, 
Digital8 



Table 14.2 Approximate compression ratios of M-JPEG for SDTV applications 



MPEG 

Apart from scene changes, there is a statistical likeli- 
hood that successive pictures in a video sequence are 
very similar. In fact, it is necessary that successive 
pictures are similar: If this were not the case, human 
vision could make no sense of the sequence! 

The m in mpeg stands for M-JPEG's compression ratio can be increased by 

moving, not motion'. a f ac to r of 5 or 10 by exploiting the inherent temporal 

redundancy of video. The MPEG standard was devel- 
oped by the Moving Picture Experts Group within ISO 
and IEC. In MPEG, an initial, self-contained picture 
provides a base value - it forms an anchor picture. 
Succeeding pictures can then be coded in terms of pixel 
differences from the anchor, as sketched in Figure 14.1 
at the top of the facing page. The method is termed 
interframe coding (though differences between fields 
may be used). 

Once the anchor picture has been received by the 
decoder, it provides an estimate for a succeeding 
picture. This estimate is improved when the encoder 
transmits the prediction errors. The scheme is effective 
provided that the prediction errors can be coded more 
compactly than the raw picture information. 

Motion may cause displacement of scene elements - 
a fast-moving element may easily move 10 pixels in one 
frame time. In the presence of motion, a pixel at a 
certain location may take quite different values in 
successive pictures. Motion would cause the prediction 
error information to grow in size to the point where the 
advantage of interframe coding would be negated. 
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Figure 14.1 Interpicture coding exploits the similarity between successive pictures in video. 

First, a base picture is transmitted (ordinarily using intra-picture compression). Then, pixel differ- 
ences to successive pictures are computed by the encoder and transmitted. The decoder recon- 
structs successive pictures by accumulating the differences. The scheme is effective provided that 
the difference information can be coded more compactly than the raw picture information. 



However, objects tend to retain their characteristics 
even when moving. MPEG overcomes the problem of 
motion between pictures by equipping the encoder 
with motion estimation circuitry: The encoder computes 
motion vectors. The encoder then displaces the pixel 
values of the anchor picture by the estimated motion - 
a process called motion compensation - then computes 
prediction errors from the motion-compensated anchor 
picture. The encoder compresses the prediction error 
information using a JPEG-like technique, then trans- 
mits that data accompanied by motion vectors. 



When encoding interlaced source 
material, an MPEG-2 encoder can 
choose to code each field as 
a picture or each frame as 
a picture, as I will describe on 
page 478. In this chapter, and in 
Chapter 40, the term picture can 
refer to either a field or a frame. 



Based upon the received motion vectors, the decoder 
mimics the motion compensation of the encoder to 
obtain a predictor much more effective than the undis- 
placed anchor picture. The transmitted prediction errors 
are then applied to reconstruct the picture. 

Picture coding types (I, P, B) 

In MPEG, a video sequence is typically partitioned into 
successive groups of pictures (GOPs). The first frame in 
each GOP is coded independently of other frames using 
a JPEG-like algorithm; this is an intra picture or 
/-picture. Once reconstructed, an l-picture becomes an 
anchor picture available for use in predicting neigh- 
boring ( nonintra ) pictures. The example GOP sketched 
in Figure 14.2 overleaf comprises nine pictures. 



A P-picture contains elements that are predicted from 
the most recent anchor frame. Once a P-picture is 
reconstructed, it is displayed; in addition, it becomes 
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Figure 14.2 MPEG group of 
pictures (GOP). The GOP 

depicted here has nine pictures, 
numbered 0 through 8. l-picture 0 
is decoded from the coded data 
depicted in the dark gray block. 
Picture 9 is not in the GOP; it is 
the first picture of the next GOP. 
Here, the intra count ( n ) is 9. 




a new anchor, l-pictures and P-pictures form a two- 
layer hierarchy. An l-picture and two dependent 
P-pictures are depicted in Figure 14.3 below. 

MPEG provides an optional third hierarchical level 
whereby B-pictures may be interposed between anchor 
pictures. Elements of a B-picture may be bidirectionally 
predicted by averaging motion-compensated elements 
from the past anchor and motion-compensated 
elements from the future anchor. Each B-picture is 
reconstructed, displayed, and discarded: No B-picture 
forms the basis for any prediction. (At the encoder's 
discretion, elements of a B-picture may be unidirection- 
ally forward-interpolated from the preceding anchor, or 
unidirectionally backward-predicted from the following 
anchor.) Using B-pictures delivers a substantial gain in 
compression efficiency compared to encoding with just 
I- and P-pictures. 

Two B-pictures are depicted in Figure 14.4 at the top of 
the facing page. The three-level MPEG picture hier- 
archy is summarized in Figure 14.5 at the bottom of the 
facing page; this example has the structure IBBPBBPBB. 



Figure 14.3 An MPEG P-picture 

contains elements forward- 
predicted from a preceding 
anchor picture, which may be an 
l-picture or a P-picture. Here, 
the first P-picture (3) is predicted 
from an l-picture (0). Once 
decoded, that P-picture 
becomes the predictor for the 
second P-picture (6). 
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Figure 14.4 An MPEG 
B-picture is generally esti- 
mated from the average of the 
preceding anchor picture and 
the following anchor picture. 
(At the encoder's option, a 
B-picture may be unidirection- 
ally forward-predicted from the 
preceding anchor, or unidirec- 
tionally backward-predicted 
from the following anchor.) 




A simple encoder typically produces a bitstream having 
a fixed schedule of I-, P-, and B-pictures. A typical GOP 
structure is denoted IBBPBBPBBPBBPBB. At 30 pictures 
per second, there are two such GOPs per second. 
Regular GOP structure is described by a pair of integers 
n and m; n is the number of pictures from one l-picture 
(inclusive) to the next (exclusive), and m is the number 
of pictures from one anchor picture (inclusive) to the 
next (exclusive). If m = 1, there are no B-pictures. 

Figure 14.5 shows a regular GOP structure with an 
l-picture interval of n = 9 and an anchor-picture interval 
of m = 3. The m = 3 component indicates two B-pictures 
between anchor pictures. 



Figure 14.5 The three-level 
MPEG picture hierarchy. This 
sketch shows a regular GOP 
structure with an l-picture 
interval of n = 9, and an anchor- 
picture interval of m = 3. This 
example represents a simple 
encoder that emits a fixed 
schedule of I-, B-, and 
P-pictures; this structure can be 
described as IBBPBBPBB. This 
example depicts an open COP, 
where B-pictures following the 
last P-picture of the GOP are 
permitted to use backward 
prediction from the l-frame of 
the following GOP. Such 
prediction precludes editing of 
the bitstream between GOPs. 

A closed GOP permits no such 
prediction, so the bitstream 
can be edited between GOPs. 
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Coded B-pictures in a COP depend upon P- and 
l-pictures; coded P-pictures depend upon earlier 
P-pictures and l-pictures. Owing to these interdepen- 
dencies, an MPEG sequence cannot be edited, except 
at GOP boundaries, unless the sequence is decoded, 
edited, and subsequently reencoded. MPEG is very suit- 
able for distribution, but owing to its inability to be 
edited without impairment at arbitrary points, MPEG is 
unsuitable for production. In the specialization of 
MPEG-2 called l-frame only MPEG-2, every GOP is 
a single l-frame. This is conceptually equivalent to 
Motion-JPEG, but has the great benefit of an inter- 
national standard. (Another variant of MPEG-2, the 
simple profile, has no B-pictures.) 

I have introduced MPEG as if all elements of a P-picture 
and all elements of a B-picture are coded similarly. But 
a picture that is generally very well predicted by the 
past anchor picture may have a few regions that cannot 
effectively be predicted. In MPEG, the image is tiled 
into macroblocks of 16x16 luma samples, and the 
encoder is given the option to code any particular 
macroblock in intra mode - that is, independently of 
any prediction. A compact code signals that a macrob- 
lock should be skipped, in which case samples from the 
anchor picture are used without modification. Also, in 
a B-picture, the encoder can decide on a macroblock- 
by-macroblock basis to code using forward prediction, 
backward prediction, or bidirectional prediction. 

Reordering 

In a sequence without B-pictures, I- and P-pictures are 
encoded and transmitted in the obvious order. 

However, when B-pictures are used, the decoder typi- 
cally needs to access the past anchor picture and the 
future anchor picture to reconstruct a B-picture. 

Figure 14.6 Example COP 

l 0 B 1 B2P3B4B 5 P 6 B7B 8 



Consider an encoder about to compress the sequence 
in Figure 14.6 (where anchor pictures l 0 , P 3 , and P 6 are 
written in boldface). The coded B-] and B 2 pictures may 
be backward predicted from P 3 , so the encoder must 
buffer the uncompressed B-| and B 2 pictures until P 3 is 
coded: Only when coding of P 3 is complete can coding 
of B-! start. Using B-pictures incurs a penalty in 
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Figure 14.7 Example 9-frame 
GOP without B-pictures 

l 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 



Figure 14.8 GOP reordered 
for transmission 

l 0 p 3BiB2 p 6 B4B5(l 9 )B 7 B 8 



ISO/IEC 11172-1, Coding of 
moving pictures and associated 
audio for digital storage media at up 
to about 1,5 Mbit/s - Part 1: 
Systems [MPEG-1], 
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encoding delay. (If the sequence were coded without 
B-pictures, as depicted in Figure 14.7, transmission of 
the coded information for P ^ would not be subject to 
this two-picture delay.) Coding delay can make MPEG 
with B-pictures unsuitable for realtime two-way appli- 
cations such as teleconferencing. 

If the coded 9-picture GOP of Figure 14.6 were trans- 
mitted in that order, then the decoder would have to 
hold the coded B-| and B 2 data in a buffer while 
receiving and decoding P 3 ; only when decoding of P 3 
was complete could decoding of B-i start. The encoder 
must buffer the B-! and B 2 pictures no matter what; 
however, to avoid the corresponding consumption of 
buffer memory at the decoder, MPEG-2 specifies that 
coded B-picture information is reordered so as to be 
transmitted after the coded anchor picture. Figure 14.8 
indicates the pictures as reordered for transmission. 

I have placed Ig in parentheses because it belongs to 
the next GOP; the GOP header precedes it. Here, B 7 
and B 8 follow the GOP header. 

MPEG-1 

The original MPEG effort resulted in a standard now 
called MPEG-1; it comprises five parts. In the margin, 

I cite Part 1: Systems. There are additional parts - 
Part 2: Video ; Part 3: Audio ; Part 4: Compliance testing ; 
and Part 5: Software simulation. MPEG-1 was used in 
consumer systems such as CD-V, and has been 
deployed in multimedia applications. MPEG-1 was opti- 
mized for the coding of progressive 352x240 images at 
30 frames per second. MPEG-1 has no provision for 
interlace. When 480/29.97 or 576/25 video is coded 
with MPEG-1 at typical data rates, the first field of each 
frame is coded as if it were progressive; the second field 
is dropped. At its intended data rate of about 1.5 Mb/s, 
MPEG-1 delivers VHS-quality images. 

For video broadcast, MPEG-1 has been superseded by 
MPEG-2. An MPEG-2 decoder must decode MPEG-1 
constrained-parameter bitstream (CPB) sequences - to 
be discussed in the caption to Table 40.1, on 
page 475 - so I will not discuss MPEG-1 further. 
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Many MPEG terms - such as 
frame, picture, and macroblock - 
can refer to elements of the 
source video, to the corre- 
sponding elements in the coded 
bitstream, or to the corre- 
sponding elements in the recon- 
structed video. It is generally clear 
from context which is meant. 



Symes, Peter, Video Compression 
Demystified (New York: McGraw- 
Hill, 2000). 
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MPEG-2 

The MPEG-2 effort was initiated to extend MPEG-1 to 
interlaced scanning, to larger pictures, and to data rates 
much higher than 1.5 Mb/s. MPEG-2 is now widely 
deployed for the distribution of digital television, 
including standard-definition television (SDTV), DVD, 
and high-definition television (HDTV). MPEG-2 is 
defined by a series of standards from ISO/I EC. 

MPEG-2 accommodates both progressive and inter- 
laced material. A video frame can be coded directly as 
a frame-structured picture. Alternatively, a video frame 
(typically originated from an interlaced source) may be 
coded as a pair of field-structured pictures - a top-field 
picture and a bottom-field picture. The two fields are 
time-offset by half the frame time, and are intended for 
interlaced display. Field pictures always come in pairs 
having opposite parity (top/bottom). Both pictures in 
a field pair have the same picture coding type (I, P, or 
B), except that an l-field may be followed by a P-field 
(in which case the pair is treated as an l-frame). 

While the MPEG-2 work was underway, an MPEG-3 
effort was launched to address HDTV. The MPEG-3 
committee concluded early on that MPEG-2, at high 
data rate, would accommodate HDTV. Consequently, 
the MPEG-3 effort was abandoned. MPEG-4, MPEG-7, 
and MPEG-21 are underway; the numbers have no 
plan. MPEG-4 is concerned with coding at very low bit 
rates. MPEG-7, titled Multimedia Content Description 
Interface, will standardize description of various types 
of multimedia information (metadata). MPEG-21 seeks 
to establish an open framework for multimedia delivery 
and consumption, thereby enabling use of multimedia 
resources across a wide range of networks and devices." 
In my estimation, none of MPEGs 4, 7, or 21 are rele- 
vant to handling studio- or distribution-quality video 
signals. 

I will detail JPEG and motion-JPEG (M-JPEG) compres- 
sion on page 447, DV compression on page 461, and 
MPEG-2 video compression on page 473. Video and 
audio compression technology is detailed in the book 
by Peter Symes cited in the margin. 
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ITU-R Rec. BT.601-5, Studio 
encoding parameters of digital tele- 
vision for standard 4:3 and wide- 
screen 16:9 aspect ratios. Should 
this standard be revised, it will be 
denoted Rec. BT.601-6. 



This chapter provides an overview of digital interfaces 
for uncompressed and compressed SDTV and HDTV. 

Component digital SDTV interface (Rec. 601, "4:2:2") 

The notation 4:2:2 originated as a reference to a 
chroma subsampling scheme that I outlined on page 90. 
During the 1980s, it came to denote a specific compo- 
nent digital video interface standard incorporating 4:2:2 
chroma subsampling. In the 1990s, the 4:2:2 chroma 
subsampling format was adopted for HDTV. As a result, 
the notation 4:2:2 is no longer clearly limited to SDTV, 
and no longer clearly denotes a scanning or interface 
standard. To denote the SDTV interface standard, I use 
the term Rec. 601 interface instead of 4:2:2. 



Recall from page 90 that in 
Rec. 601 , C B and are cosited - 
each is centered on the same 
location as Yj, where j is even. 



In Rec. 601, at 4:3 aspect ratio, luma is sampled at 
13.5 MHz. C B and C R color difference components are 
horizontally subsampled by a factor of 2:1 with respect 
to luma - that is, sampled at 6.75 MHz each. Samples 
are multiplexed in the sequence {C B , Y 0 ', C R , Yf}. 



Most 4:2:2 systems now accom- 
modate 10-bit components. 



Sampling at 13.5 MHz produces a whole number of 
samples per total line (S TL ) in 480/ systems (with 
858 Sj|_) and 576/ systems (with 864 S TL ). The word 
rate at the interface is twice the luma sampling 
frequency: For each luma sampling clock, a color differ- 
ence sample and a luma sample are transmitted. An 
8-bit, 4:2:2 interface effectively carries 16 bits per pixel; 
the total data rate is 27 MB/s. A 10-bit serial interface 
effectively carries 20 bits per pixel, and has a total bit 
rate of 270 Mb/s. 
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Figure 15.1 Scan-line waveform for 480/29.97, 4:2:2 component luma. EBU Tech. N10 analog 
levels are shown; however, these levels are rarely used in 480/. In analog video, sync is blacker-than- 
black, at -300 mV. (In digital video, sync is not coded as a signal level.) This sketch shows 8-bit inter- 
face levels (in bold); black is at code 16 and white is at code 235. The 720 active samples contain 
picture information; the remaining 138 sample intervals of the 858 comprise horizontal blanking. 



Rec. 601, adopted in 1984, specified abstract coding 
parameters (including 4:2:2 chroma subsampling). 
Shortly afterwards, a parallel interface using 25-pin 
connectors was standardized in SMPTE 125 M, 

EBU Tech. 3246, and Rec. 656. To enable transmission 
across long runs of coaxial cable, parallel interfaces have 
been superseded by the serial digital interface (SDI). 

Both 480/ and 576/ have 720 active luma samples per 
line (S AL ). In uncompressed, 8-bit Rec. 601 video, the 
active samples consume about 20 MB/s. 

Figure 15.1 above shows the luma (or R', C, or B') 
waveform of a single scan line of 480/ component 
video. The time axis shows sample counts at the 
Rec. 601 rate of 13.5 MHz; divide the sample number 
by 13.5 to derive time in microseconds. Amplitude is 
shown in millivolts (according to EBU Tech. N10 levels), 
and in 8-bit Rec. 601 digital interface code values. 

Digital video interfaces convey active video framed in 
timing reference signal (TRS) sequences including start 
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Figure 15.2 Rec. 656 compo- 
nent digital interface uses 
EAV to signal the start of 
each horizontal blanking 
interval, and SAV to signal 
the start of active video. 
Between EAV and SAV, ancil- 
lary data (HANC) can be 
carried. In a nonpicture line, 
the region between SAV and 
EAV can carry ancillary data 
(VANC). Digitized ancillary 
signals may be carried in lines 
other than those that convey 
either VANC or analog sync. 




Digitized 

Ancillary 

Signals 



of active video (SAV) and end of active video (EAV). 
Ancillary data (ANC) and digitized ancillary signals are 
permitted in regions not occupied by active video. 
Figure 15.2 shows the raster diagram of Chapter 6, 
augmented with EAV, SAV, and the HANC and VANC 
regions. Details will be presented in Digital sync, TRS, 
ancillary data, and interface, on page 389. 

Composite digital SDTV (4/ sc ) interface 

Composite 4 f sc digital interfaces code the entire 8- or 
10-bit composite data stream, including sync edges, 
back porch, and burst. The interface word rate is the 
same as the sampling frequency, typically about half the 
rate of a component interface having the same scan- 
ning standard. The 4/ sc interface shares the electrical 
and physical characteristics of the 4:2:2 interface. 
Composite 4 f sc NTSC has exactly 910 sample intervals 
per total line (S TL ), and a data rate of about 143 Mb/s. 

Composite 4/ sc PAL has a noninteger number of sample 
intervals per line: Samples in successive lines are offset 
to the left a small fraction ( 4 / 62 5 ) of the horizontal 
sample pitch. Sampling is not precisely orthogonal, 
although digital acquisition, processing, and display 
equipment treat it so. All but two lines in each frame 
have 1135 S TL ; each of the other two lines - preferably 
lines 313 and 625 - has 1137 S TL . For 10-bit 4/ sc , total 
data rate (including blanking) is about 177 Mb/s. 
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Serial digital interface (SDI) 

smpte 259M, 10 -Bit 4:2:2 Compo- Serial digital interface (SDI) refers to a family of inter- 



nent and 4f sc Composite Digital 
Signals - Serial Digital Interface. 


faces standardized by SMPTE. The Rec. 601 or 4/ sc data 
stream is serialized, then subjected to a scrambling 
technique. SMPTE 259M standardizes several inter- 
faces, denoted by letters A through D as follows: 

A Composite 4/ sc NTSC video, about 143 Mb/s 

B Composite 4/ sc PAL video, about 177 Mb/s 

C Rec. 601 4:2:2 component video, 270 Mb/s (This inter- 
face is standardized in Rec. 656.) 

D Rec. 601 4:2:2 component video sampled at 1 8 MHz to 
achieve 16:9 aspect ratio, 360 Mb/s 

Interfaces related to SMPTE 259M are standardized for 
the 483p59.94 systems specified in SMPTE 294M: 

• The 4:2:2p system uses two 270 Mb/s SDI links ("dual 
link"), for a data rate of 540 Mb/s 

• The 4:2:0p system uses a single link at 360 Mb/s 


SMPTE 344M, 540 Mb/s Serial 
Digital Interface. 


SMPTE 344M standardizes an interface at 540 Mb/s, 
intended for 480/29.97, 4:4:4:4 component video; this 
could be adapted to convey 483/?59.94, 4:2:0p video. 

SDI is standardized for electrical transmission through 
coaxial cable, and for transmission through optical fiber. 
The SDI electrical interface uses ECL levels, 75 Q 
impedance, BNC connectors, and coaxial cable. Elec- 
trical and mechanical parameters are specified in 
SMPTE standards and in Rec. 656; see SDI coding on 
page 396. Fiber-optic interfaces for digital SDTV, speci- 
fied in SMPTE 297M, are straightforward adaptations of 
the serial versions of Rec. 656. 

Component digital HDTV HD-SDI 

The basic coding parameters of HDTV systems are stan- 
dardized in Rec. 709. Various scanning systems are 
detailed in several SMPTE standards referenced in 
Table 13.2, on page 116. 
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Component SDTV, composite 4/ S q NTSC, and 
composite 4 f sc PAL all have different sample rates and 
different serial interface bit rates. In HDTV, a uniform 
sample rate of 74.25 MHz is adopted (modified by the 
ratio 1000 /iooi ' n applications where compatibility with 
59.94 Hz frame rate is required). A serial interface bit 
rate of 20 times the sampling rate is used. Variations of 
the same standard accommodate mainstream 1080/30, 
1080p24, and 720p60 scanning; 1080p30; and the 
obsolete 1035/30 system. The integer picture rates 24, 
30, and 60 can be modified by the fraction 1000 /iooi , 
giving rates of 23.976 Hz, 29.97 Hz, and 59.94 Hz. 



The 23.976 Hz, 29.97 Hz, and 
59.94 Hz frame rates are associ- 
ated with a sampling rate of: 

74 25 

— — «74.176Mpx/s 
1.001 

The corresponding HD-SDI 
serial interface bit rate is: 



1.485 

1.001 



« 1.483 Gb/s 



See Figure 13.3, on page 115. 



The SDI interface at 270 Mb/s has been adapted to 
HDTV by scaling the bit rate by a factor of 5.5, yielding 
a fixed bit rate of 1.485 Gb/s. The sampling rate and 
serial bit rate for 23.976 Hz, 29.97 Hz, and 59.94 Hz 
interfaces are indicated in the margin. This interface is 
standardized for T'C B C R , subsampled 4:2:2. Dual-link 
HD-SDI can be used to convey R'G'B'A, 4:4:4:4. 

HD-SDI accommodates 1080/25.00 and 1080p25.00 
variants that might find use in Europe. This is accom- 
plished by placing the 1920x1080 image array in 
a scanning system having 25 Hz rate. % is altered from 
the 30 Hz standard to form an 1125/25 raster. 



The standard HDTV analog interfaces use trilevel sync, 
instead of the bilevel sync that is used for analog SDTV. 
Figure 15.3 opposite shows the scan-line waveform, 
including trilevel sync, for 1080/30 HDTV. 

smpte 292 m, Bit-Serial Digital The HD-SDI interface is standardized in SMPTE 292 M. 

interface for High-Definition Televi- Fiber-optic interfaces for digital HDTV are also speci- 
sion Systems - fied in SMPTE 292M. 



Interfaces for compressed video 

Compressed digital video interfaces are impractical in 
the studio owing to the diversity of compression 
systems, and because compressed interfaces would 
require decompression capabilities in signal processing 
and monitoring equipment. Compressed 4:2:2 digital 
video studio equipment is usually interconnected 
through uncompressed SDI interfaces. 
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Figure 15.3 Scan-line waveform for 1080/30 HDTV component luma. Analog trilevel sync is 
shown, excursing ±300 mV. (In digital video, sync is not coded as a signal level.) At an 8-bit inter- 
face, black is represented by code 16 and white by 235. The indicated 1920 active samples contain 
picture information; the remaining sample intervals of the 2200 total comprise horizontal blanking. 



Compressed interfaces can be used to transfer video 
into nonlinear editing systems, and to "dub" (dupli- 
cate) between VTRs sharing the same compression 
system. Compressed video can be interfaced directly 
using serial data transport interface (SDTI), to be 
described in a moment. The DVB ASI interface is widely 
used to convey MPEG-2 transport streams in network 
or transmission applications (but not in production). 
SMPTE SSI is an alternative, though it is not as popular 
as ASI. The IEEE 1394/DV interface, sometimes called 
FireWire or i.LINK, is widely used in the consumer elec- 
tronics arena, and is beginning to be deployed in 
broadcast applications. 

SDTI 

smpte 305. 2 M, Serial Data Trans- SMPTE has standardized a derivative of SDI, serial data 

port interface. transport interface (SDTI), that transmits arbitrary data 

packets in place of uncompressed active video. SDTI 
can be used to transport DV25 and DV50 compressed 
datastreams. Despite DV bitstreams being standard- 
ized, different manufacturers have chosen incompatible 
techniques to wrap their compressed video data into 
SDTI streams. This renders SDTI useful only for inter- 
connection of equipment from a single manufacturer. 
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DVB ASI and SMPTE SSI 



CENELEC EN 50083-9, Cabled 
distribution systems for television, 
sound and interactive multimedia 
signals - Part 9: Interfaces for 
CATV/SMATV headends and similar 
professional equipment for 
DVB/MPEG-2 transport streams. 



SMPTE 310M, Synchronous 
Serial Interface for MPEG-2 
Digital Transport Stream. 



IEEE 1394, Standard for a High 
Performance Serial Bus. 
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The DVB organization has standardized a high-speed 
serial interface for an MPEG-2 transport stream - the 
asynchronous serial interface (ASI). MPEG-2 transport 
packets of 188 bytes are subject to 8b-10b coding, 
then serialized. (Optionally, packets that have been 
subject to Reed-Solomon encoding can be conveyed; 
these packets have 204 bytes each.) The 8b-10b 
coding is that of the FiberChannel standard. The link 
operates at the SDI rate of 270 Mb/s; synchronization 
(filler) codes are sent while the channel is not occupied 
by MPEG-2 data. The standard specifies an electrical 
interface whose physical and electrical parameters are 
drawn from the SMPTE SDI standard; the standard also 
specifies a fiber-optic interface. 

A functional alternative to DVB-ASI is the synchronous 
serial interface (SSI), which is designed for use in envi- 
ronments with high RF fields. SSI is standardized in 
SMPTE 310M. As I write this, it is not very popular, 
except for interconnection of ATSC bitstreams to 8-VSB 
modulators. 

IEEE 1394 (FireWire, i.LINK) 

In 1995, the IEEE standardized a general-purpose high- 
speed serial bus capable of connecting up to 63 devices 
in a tree-shaped network through point-to-point 
connections. The link conveys data across two shielded 
twisted pairs (STP), and operates at 100 Mb/s, 

200 Mb/s, or 400 Mb/s. Each point-to-point segment is 
limited to 4.5 m; there is a limit of 72 m across the 
breadth of a network. Asynchronous and isochronous 
modes are provided; the latter accommodates realtime 
traffic. Apple computer refers to the interface by their 
trademark FireWire. Sony's trademark is i.LINK, though 
Sony commonly uses a 4-pin connector not strictly 
compliant with the IEEE standard. (The 6-pin IEEE 
connector provides power for a peripheral device; 
power is absent from Sony's 4-pin connector. A node 
may have either 4-pin or 6-pin connectors.) 

As I write in 2002, agreement upon IEEE 1394B 
("Gigabit 1 394") is imminent. For STP media at 
a distance of 4.5 m per link, this extends the data rate 
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to 800 Mb/s, 1.6 Gb/s, or 3.2 Gb/s. In addition, 1394B 
specifies four additional media: 



I EC 61883-1, Consumer 
audio/video equipment - Digital 
interface - Part 7: General. See 
also parts 2 through 5. 



SMPTE RP 168, Definition of 
Vertical Interval Switching Point 
for Synchronous Video Switching. 
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• Plastic optical fiber (POF), for distances of up to 50 m, 
at data rates of either 100 or 200 Mb/s 

• CAT 5 coaxial cable, for distances of up to 100 m, at 
100 Mb/s 

• Hard polymer-clad fiber (HPCF), for distances of up to 
100 m, at 100 or 200 Mb/s 

• Multimode glass optical fiber (GOF), for distances of up 
to 100 m at 100, 200, 400, or 800 Mb/s, or 1.6 or 

3.2 Gb/s 

I EC has standardized the transmission of digital video 
over IEEE 1394. Video is digitized according to 
Rec. 601, then motion-JPEG coded (using the DV stan- 
dard) at about 25 Mb/s; this is colloquially known as 
1 394/DV25 (or DV25-over-1394). DV coding has been 
adapted to 100 Mb/s for HDTV (DV100); a standard for 
DVIOO-over-1394 has been adopted by I EC. 

A standard for conveying an MPEG-2 transport stream 
over IEEE 1394 has also been adopted by IEC; however, 
commercial deployment of MPEG-2-over-1394 is slow, 
mainly owing to concerns about copy protection. The 
D-7 (DVCPRO50) and D-9 (Digital-S) videotape 
recorders use DV coding at 50 Mb/s; a standard DV50 
interface across IEEE 1394 is likely to be developed. 

Switching and mixing 

Switching or editing between video sources - 
"cutting" - is done in the vertical interval, so that each 
frame of the resulting video remains intact, without any 
switching transients. When switching between two 
signals in a hardware switcher, if the output signal is to 
be made continuous across the instant of switching, the 
input signals must be synchronous - the 0 V instants of 
both signals must match precisely in time. To prevent 
switching transients from disturbing vertical sync 
elements, switching is done somewhat later than 0 V ; 
see SMPTE RP 168. 
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In a vacuum, light travels 
0.299972458 m - very nearly 
one foot - each nanosecond. 



3.579545 = 5 - — 
88 



System phase advances or delays all 
components of the signal. Histori- 
cally, HORIZONTAL PHASE (or 
h phase) altered sync and luma but 
left burst and subcarrier untouched. 
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Timing in analog facilities 

Signals propagate through coaxial cable at a speed 
between about 0.66 and 0.75 of the speed of light in 
a vacuum. Time delay is introduced by long cable runs, 
and by processing delay through equipment. Even over 
a long run of 300 m (1000 ft) of cable, only a micro- 
second or two of delay is introduced - well under V 4 of 
a line time for typical video standards. (To reach a delay 
of one line time in 480; or 576; would take a run of 
about 12 km!) To compensate typical cable delay 
requires an adjustment of horizontal timing, by just 
a small fraction of a line time. 

In analog video, these delays are accommodated by 
advancing the timing at each source, so that each signal 
is properly timed upon reaching the production 
switcher. In a medium-size or large facility, a single sync 
generator (or a pair of redundant sync generators) 
provides house sync, to which virtually everything else 
in the facility is locked with appropriate time advance 
or delay. To enable a seamless switch from a network 
source to a local source in the early days of television 
networks, every television station was locked to timing 
established by its network! Each network had an atomic 
clock, generating 5 MHz. This was divided to subcarrier 
using the relationship in the margin. 

Many studio sources - such as cameras and VTRs - can 
be driven from a reference input that sets the timing of 
the primary output. This process was historically 
referred to as "sync generator locking," or nowadays, as 
genlock. In the absence of a reference signal, equip- 
ment is designed to free-run: Its frequency will be 
within tolerance, but its phase will be unlocked. 

In studio equipment capable of genlock, with factory 
settings the output signal emerges nominally synchro- 
nous with the reference. Studio equipment is capable of 
advancing or delaying its primary output signal with 
respect to the reference, by perhaps ±Va of a line time, 
through an adjustment called system phase. Nowadays, 
some studio video equipment has vertical processing 
that incorporates line delays; such equipment intro- 
duces delay of a line time, or perhaps a few line times. 
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For details concerning SCH , 
see page 512. 



To compensate line delays, system phase must now 
accommodate adjustment of vertical delay as well as 
horizontal. The adjustment is performed by matching 
the timing of the 50%-points of sync. Misadjustment of 
system phase is reflected as position error. 

A studio sync generator can, itself, be genlocked. 

A large facility typically has several sync generators 
physically distributed throughout the facility. Each 
provides local sync, and each is timed according to its 
propagation delay back to the central switching or 
routing point. 

If a piece of studio equipment originates a video signal, 
it is likely to have adjustable system phase. However, if 
it processes a signal, and has no framestore, then it is 
likely to exhibit fixed delay: It is likely to have no 
genlock capability, and no capability to adjust system 
phase. Delay of such a device can be compensated by 
appropriate timing of its source. For example, a typical 
video switcher has fixed delay, and no system phase 
adjustment. (It has a reference input whose sync 
elements are inserted onto the primary outputs of the 
switcher, but there is no genlock function.) 

A routing switcher is a large matrix of crosspoint 
switches. A routing switcher is designed so that any 
path through the switcher incurs the same fixed delay. 

Timing in composite analog NTSC and PAL 

NTSC modulation and demodulation work properly 
provided that burst phase and modulated subcarrier 
phase remain locked: Color coding is independent of 
the phase relationship between subcarrier and sync. 

If two signals are to be switched or mixed, though, their 
modulated subcarrier phase (and therefore their burst 
phases) must match - otherwise, hue would shift as 
mixing took place. But the phase of luma (and there- 
fore of the analog sync waveform) must match as well - 
otherwise, picture position would shift as mixing took 
place. These two requirements led to standardization of 
the relationship of subcarrier to horizontal (SCH) phase. 
It is standard that the zerocrossing of unmodulated 
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Subcarrier phase is sometimes inac- 
curately called burst phase, because 
the adjustment involves rotating the 
phase of burst. However, the 
primary effect is to adjust the phase 
of modulated chroma. 



FIFO: First in, first out. 
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subcarrier be synchronous with 0 H at line 1 of the 
frame, within about ±10°. In NTSC, if this requirement is 
met at line 1, then the zerocrossing of subcarrier will be 
coincident with the analog sync reference point (0 H ) 
within the stated tolerance on every line. In PAL, the 
requirement is stated at line 1, because the phase rela- 
tionship changes throughout the frame. Neither NTSC 
nor PAL has burst on line 1; SCH must be measureed 
from with regenerated subcarrier, or measured from 
burst on some other line (such as line 10). 

For composite analog switching, it is necessary that the 
signals being mixed have matching 0 V ; but in addition, 
it is necessary that the signals have matching subcarrier 
phase. (If this were not the case, hue would shift during 
the transition.) As I have mentioned, cable delay is 
accommodated by back-timing. However, with imper- 
fect cable equalization, cable delay at subcarrier 
frequency might be somewhat different than delay at 
low frequency. If the source generates zero SCH, you 
could match system timing, but have incorrect subcar- 
rier phase. The solution is to have, at a composite 
source, a subcarrier phase adjustment that rotates the 
phase of subcarrier through 360°. Equipment is timed 
by adjusting system phase to match sync edges (and 
thereby, luma position), then adjusting subcarrier 
phase to match burst phase (and thereby, the phase of 
modulated chroma). 

Timing in digital facilities 

Modern digital video equipment has, at each input, 
a buffer that functions as a FIFO. This buffer at each 
input accommodates an advance of timing at that input 
(with respect to reference video) of up to about 
±100 ps. Timing a digital facility involves advancing 
each signal source so that signals from all sources arrive 
in time at the inputs of the facility's main switcher. This 
timing need not be exact: It suffices to guarantee that 
no buffer overruns or underruns. When a routing 
switcher switches among SDI streams, a timing error of 
several dozen samples is tolerable; downstream equip- 
ment will recover timing within one or two lines after 
the instant of switching. 
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Some video switchers incorpo- 
rate digital video effects (DVE) 
capability; a DVE unit necessarily 
includes a framestore. 



When a studio needs to accommodate an asynchro- 
nous video input - one whose frame rate is within 
tolerance, but whose phase cannot be referenced to 
house sync, such as a satellite feed - then a framestore 
synchronizer is used. This device contains a frame of 
memory that functions as a FIFO buffer for video. An 
input signal with arbitrary timing is written into the 
memory with timing based upon its own sync elements. 
The synchronizer accepts a reference video signal; the 
memory is read out at rates locked to the sync elements 
of the reference video. (Provisions are made to adjust 
system phase - that is, the timing of the output signal 
with respect to the reference video.) An asynchronous 
signal is thereby delayed up to one frame time, perhaps 
even a little more, so as to match the local reference. 
The signal can then be used as if it were a local source. 

Some studio video devices incorporate framestores, and 
exhibit latency of a field, a frame, or more. Low-level 
timing of such equipment is accomplished by intro- 
ducing time advance so that 0 V appears at the correct 
instant. Plowever, even if video content is timed 
correctly with respect to 0 V , it may be late by a frame, 
or in a very large facility, by several frames. Attention 
must be paid to delaying audio by a similar time 
interval, to avoid lip-sync problems. 
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My explanation describes the 
original sampling of an analog 
signal waveform. If you are more 
comfortable remaining in the 
digital domain, consider the 
problem of shrinking a row of 
image samples by a factor of n 
(say, n = 1 6 ) to accomplish image 
resizing. You need to compute 
one output sample for each set of 
n input samples. This is the resam- 
pling problem in the digital 
domain. Its constraints are very 
similar to the constraints of orig- 
inal sampling of an analog signal. 



This chapter explains how a one-dimensional signal is 
filtered and sampled prior to A-to-D conversion, and 
how it is reconstructed following D-to-A conversion. In 
the following chapter, Resampling, interpolation, and 
decimation, on page 171, I extend these concepts to 
conversions within the digital domain. In Image digitiza- 
tion and reconstruction, on page 187, I extend these 
concepts to the two dimensions of an image. 

When a one-dimensional signal (such as an audio 
signal) is digitized, each sample must encapsulate, in 
a single value, what might have begun as a complex 
waveform during the sample period. When a 
two-dimensional image is sampled, each sample encap- 
sulates what might have begun as a potentially complex 
distribution of power over a small region of the image 
plane. In each case, a potentially vast amount of infor- 
mation must be reduced to a single number. 

Prior to sampling, detail within the sample interval 
must be discarded. The reduction of information prior 
to sampling is prefiltering. The challenge of sampling is 
to discard this information while avoiding the loss of 
information at scales larger than the sample pitch, all 
the time avoiding the introduction of artifacts. Sampling 
theory elaborates the conditions under which a signal 
can be sampled and accurately reconstructed, subject 
only to inevitable loss of detail that could not, in any 
event, be represented by a given number of samples in 
the digital domain. 
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Sampling theory was originally developed to describe 
one-dimensional signals such as audio, where the signal 
is a continuous function of the single dimension of 
time. Sampling theory has been extended to images, 
where an image is treated as a continuous function of 
two spatial coordinates (horizontal and vertical). 
Sampling theory can be further extended to the 
temporal sampling of moving images, where the third 
coordinate is time. 



Figure 16.1 Cosine waves less 
than and greater than 0.5/ s , 

in this case at the fractions 
0.35 and 0.65 of the sampling 
rate, produce exactly the same 
set of sampled values when 
point-sampled - they alias. 



Sampling theorem 

Assume that a signal to be digitized is well behaved, 
changing relatively slowly as a function of time. 
Consider the cosine signals shown in Figure 16.1 below, 
where the x-axis shows sample intervals. The top wave- 
form is a cosine at the fraction 0.35 of the sampling rate 
/ s ; the middle waveform is at 0.65 f s . The bottom row 
shows that identical samples result from sampling either 
of these waveforms: Either of the waveforms can 
masquerade as the same sample sequence. If the 
middle waveform is sampled, then reconstructed 
conventionally, the top waveform will result. This is the 
phenomenon of aliasing. 
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Figure 16.2 Cosine 
waves at exactly 0.5/ s 

cannot be accurately 
represented in a sample 
sequence if the phase or 
amplitude of the sampled 
waveform is arbitrary. 




Sampling at exactly 0.5/ s 

You might assume that a signal whose frequency is 
exactly half the sampling rate can be accurately repre- 
sented by an alternating sequence of sample values, 
say, zero and one. In Figure 16.2 above, the series of 
samples in the top row is unambiguous (provided it is 
known that the amplitude of the waveform is unity). 

But the samples of the middle row could be generated 
from any of the three indicated waveforms, and the 
phase-shifted waveform in the bottom row has samples 
that are indistinguishable from a constant waveform 
having a value of 0.5. The inability to accurately analyze 
a signal at exactly half the sampling frequency leads to 
the strict "less-than 1 ' condition in the Sampling 
Theorem, which I will now describe. 



Nyquist essentially applied to 
signal processing a mathematical 
discovery made in 1915 by E.T. 
Whittaker. Later contributions 
were made by Shannon (in the 
U.S.) and Kotelnikov (in Russia). 



Harry Nyquist, at Bell Labs, concluded in about 1928 
that to guarantee sampling of a signal without the 
introduction of aliases, all of the signal's frequency 
components must be contained strictly within half the 
sampling rate (now known as the Nyquist frequency). If 
a signal meets this condition, it is said to satisfy the 
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Figure 16.3 Point sampling 
runs the risk of choosing an 
extreme value that is not 
representative of the neigh- 
borhood surrounding the 
desired sample instant. 

Figure 16.4 Boxcar 
weighting function has unity 
value throughout one sample 
interval; elsewhere, its value 
is zero. 



Figure 16.5 Boxcar filtering 

weights the input waveform 
with the boxcar weighting 
function: Each output sample 
is the average across one 
sample interval. 
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Nyquist criterion. The condition is usually imposed by 
analog filtering, prior to sampling, that removes 
frequency components at 0.5 / s and higher. A filter 
must implement some sort of integration. In the 
example of Figure 16.1 , no filtering was performed; the 
waveform was simply point-sampled . The lack of 
filtering admitted aliases. Figure 16.3 represents the 
waveform of an actual signal; point sampling at the 
indicated instants yields sample values that are not 
representative of the local neighborhood at each 
sampling instant. 

Perhaps the most basic way to filter a waveform is to 
average the waveform across each sample period. Many 
different integration schemes are possible; these can be 
represented as weighting functions plotted as 
a function of time. Simple averaging uses the boxcar 
weighting function sketched in Figure 16.4; its value is 
unity during the sample period and zero outside that 
interval. Filtering with this weighting function is called 
boxcar filtering, since a sequence of these functions 
with different amplitudes resembles the profile of 
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O.75/5. The shaded area 
under the curve illustrates 
its integral computed by 
a boxcar function. The 
bottom graph shows that 
the sequence of resulting 
sample points is domi- 
nated by an alias at O.25/5. 



a freight train. Once the weighted values are formed 
the signal is represented by discrete values, plotted for 
this example in Figure 16.5. To plot these values as 
amplitudes of a boxcar function would wrongly suggest 
that a boxcar function should be used as a reconstruc- 
tion filter. The shading under the waveform of 
Figure 16.3 suggests box filtering. 

A serious problem with boxcar filtering across each 
sample interval is evident in Figure 16.6 above. The top 
graph shows a sine wave at 0.75/ s ; the signal exceeds 
the Nyquist frequency. The shaded regions show inte- 
gration over intervals of one sample period. For the sine 
wave at 0.75 f s , sampled starting at zero phase, the first 
two integrated values are about 0.6061; the second 
two are about 0.3939. The dominant component of the 
filtered sample sequence, shown in the bottom graph, 
is one-quarter of the sampling frequency. Filtering using 
a one-sample-wide boxcar weighting function is inade- 
quate to attenuate signal components above the 
Nyquist rate. An unwanted alias results. 

Figure 16.6 is another example of aliasing: Owing to 
a poor presampling filter, the sequence of sampled 
values exhibits a frequency component not present in 
the input signal. As this example shows, boxcar integra- 
tion is not sufficient to prevent fairly serious aliasing. 
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Strictly speaking, amplitude is an 
instantaneous measure that may 
take a positive or negative value. 
Magnitude is properly either an 
absolute value, or a squared or 
root mean square (RMS) value 
representative of amplitude over 
some time interval. The terms are 
often used interchangeably. 



See Linearity on page 21. 



Bracewell, Ronald N., The Fourier 
Transform and its Applications, 
Second Edition (New York: 
McGraw-Hill, 1985). 



146 



Magnitude frequency response 

To gain a general appreciation of aliasing, it is neces- 
sary to understand signals in the frequency domain. The 
previous section gave an example of inadequate 
filtering prior to sampling that created an unexpected 
alias upon sampling. You can determine whether a filter 
has an unexpected response at any frequency by 
presenting to the filter a signal that sweeps through all 
frequencies, from zero, through low frequencies, to 
some high frequency, plotting the response of the filter 
as you go. I graphed such a frequency sweep signal at 
the top of Figure 7.1, on page 66. The middle graph of 
that figure shows a response waveform typical of 
a lowpass filter (LPF), which attenuates high frequency 
signals. The magnitude response of that filter is shown 
in the bottom graph. 

Magnitude response is the RMS average response over 
all phases of the input signal at each frequency. As you 
saw in the previous section, a filter's response can be 
strongly influenced by the phase of the input signal. To 
determine response at a particular frequency, you can 
test all phases at that frequency. Alternatively, provided 
the filter is linear, you can present just two signals - 
a cosine wave at the test frequency and a sine wave at 
the same frequency. The filter's magnitude response at 
any frequency is the absolute value of the vector sum of 
the responses to the sine and the cosine waves. 

Analytic and numerical procedures called transforms can 
be used to determine frequency response. The Laplace 
transform is appropriate for continuous functions, such 
as signals in the analog domain. The Fourier transform is 
appropriate for signals that are sampled periodically, or 
for signals that are themselves periodic. A variant 
intended for computation on data that has been 
sampled is the discrete Fourier transform (DFT). An 
elegant scheme for numerical computation of the DFT 
is the fast Fourier transform (FFT). The z-transform is 
essentially a generalization of the Fourier transform. All 
of these transforms represent mathematical ways to 
determine a system's response to sine waves over a 
range of frequencies and phases. The result of a trans- 
form is an expression or graph in terms of frequency. 
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Figure 1 6.7 Frequency response of a boxcar filter. The top graph shows a boxcar weighting func- 
tion, symmetrical around t = 0. Its frequency spectrum is a sine function, shown underneath. The 
solid line shows that at certain frequencies, the filter causes phase inversion. Filter response is 
usually plotted as magnitude; phase inversion in the stopband is reflected as the absolute (magni- 
tude) values shown in dashed lines. 



1, a) = 0 

sincaj = sinoj 

, w * 0 

CJ 

Eq 16.1 sine function is 

pronounced sink. Formally, its 
argument is in radians per 
second (rad-s -1 ); here I use the 
conventional symbol co for that 
quantity. The term (sin x)/x 
(pronounced sine ecks over ecks) 
is often used synonymously 
with sine, without mention of 
the units of the argument. If 
applied to frequency in hertz, 
the function could be written 
(sin 2n/)/2n/. 

sine is unrelated to sync 
(synchronization). 



Magnitude frequency response of a boxcar 

The top graph of Figure 16.7 above shows the 
weighting function of Point sampling on page 144, as 
a function of time (in sample intervals). The Fourier 
transform of the boxcar function - that is, the magni- 
tude frequency response of a boxcar weighting 
function - takes the shape of (sin x)/x. The response is 
graphed at the bottom of Figure 16.7, with the 
frequency axis in units of oj = 2n/ s . Equation 16.1 in the 
margin defines the function. This function is so impor- 
tant that it has been given the special symbol sine, 
introduced by Phillip M. Woodward in 1953 as a 
contraction of sinus cardinalis. 

A presampling filter should have fairly uniform response 
below half the sample rate, to provide good sharpness, 
and needs to severely attenuate frequencies at and 
above half the sample rate, to achieve low aliasing. The 
bottom graph of Figure 16.7 shows that this require- 
ment is not met by a boxcar weighting function. The 
graph of sine predicts frequencies where aliasing can be 
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A near-ideal filter in analog 
video is sometimes called a brick 
wall filter, though there is no 
precise definition of this term. 
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introduced. Figure 16.6 showed an example of 
a sinewave at 0.75/ s ; reading the value of sine at 1.5 n 
from Figure 16.7 shows that aliasing is expected. 

You can gain an intuitive understanding of the boxcar 
weighting function by considering that when the input 
frequency is such that an integer number of cycles lies 
under the boxcar, the response will be null. But when 
an integer number of cycles, plus a half-cycle, lies under 
the weighting function, the response will exhibit a local 
maximum that can admit an alias. 

To obtain a presampling filter that rejects potential 
aliases, we need to pass low frequencies, up to almost 
half the sample rate, and reject frequencies above it. 

We need a frequency response that is constant at unity 
up to just below 0.5 f s , whereupon it drops to zero. We 
need a filter function whose frequency response - not 
time response - resembles a boxcar. 

The sine weighting function 

Remarkably, the Fourier transform possesses the mathe- 
matical property of being its own inverse (within a scale 
factor). In Figure 16.7, the Fourier transform of a boxcar 
weighting function produced a sine-shaped frequency 
response. Figure 16.8 opposite shows a sine-shaped 
weighting function; it produces a boxcar-shaped 
frequency response. So, sine weighting gives the ideal 
lowpass filter (ILPF), and it is the ideal temporal 
weighting function for use in a presampling filter. 
Flowever, there are several theoretical and practical 
difficulties in using sine. In practice, we approximate it. 

An analog filter's response is a function of frequency on 
the positive real axis. In analog signal theory, there is no 
upper bound on frequency. But in a digital filter the 
response to a test frequency / T is identical to the 
response at/ T offset by any integer multiple of the 
sampling frequency: The frequency axis "wraps" at 
multiples of the sampling rate. Sampling theory also 
dictates "folding" around half the sample rate. Signal 
components having frequencies at or above the Nyquist 
rate cannot accurately be represented. 
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Figure 16.8 The sin (x)/x (or sine) weighting function is shown in the top graph. Its frequency 
spectrum, shown underneath, has a boxcar shape: sine weighting exhibits the ideal properties for 
a presampling filter. However, its infinite extent makes it physically unrealizable; also, its negative 
lobes make it unrealizable for transducers of light such as cameras, scanners, and displays. Many 
practical digital lowpass filters have coefficients that approximate samples of sine. 



The temporal weighting functions used in video are 
usually symmetrical; nonetheless, they are usually 
graphed in a two-sided fashion. The frequency response 
of a filter suitable for real signals is symmetrical about 
zero; conventionally, frequency response is graphed in 
one-sided fashion starting at zero frequency ("DC"). 
Sometimes it is useful to consider or graph frequency 
response in two-sided style. 

Frequency response of point sampling 

The Fourier transform provides an analytical tool to 
examine frequency response: We can reexamine point 
sampling. Taking an instantaneous sample of 
a waveform is mathematically equivalent to using 
a weighting function that is unity at the sample instant, 
and zero everywhere else - the weighting function is an 
impulse. The Fourier transform of an impulse function is 
constant, unity, at all frequencies. A set of equally 
spaced impulses is an impulse train ; its transform is also 
unity everywhere. The sampling operation is repre- 
sented as multiplication by an impulse train. An unfil- 
tered signal sampled by a set of impulses will admit 
aliases equally from all input frequencies. 
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Clarke, R.J., Transform Coding of 
Images (Boston: Academic Press, 
1985). 




Figure 16.11 Gaussian func- 
tion is shown here in its one- 
sided form, with the scaling that 
is usual in statistics, where the 
function (augmented with mean 
and variance terms) is known as 
the normal function. Its integral 
is the error function, erf(x). The 
frequency response of cascaded 
Gaussian filters is Gaussian. 



Fourier transform pairs 

Figure 16.9 opposite shows Fourier transform pairs for 
several different functions. In the left column is a set of 
waveforms; beside each waveform is its frequency spec- 
trum. Functions having short time durations transform 
to functions with widely distributed frequency compo- 
nents. Conversely, functions that are compact in their 
frequency representation transform to temporal func- 
tions with long duration. (See Figure 16.10 overleaf.) 

A Gaussian function - the middle transform pair in 
Figure 16.9, detailed in Figure 16.11 in the margin - is 
the identify function for the Fourier transform: It has 
the unique property of transforming to itself (within 
a scale factor). The Gaussian function has moderate 
spread both in the time domain and in the frequency 
domain; it has infinite extent, but becomes negligibly 
small more than a few units from the origin. The 
Gaussian function lies at the balance point between the 
distribution of power in the time domain and the distri- 
bution of power in the frequency domain. 

Analog filters 

Analog filtering is necessary prior to digitization, to 
bring a signal into the digital domain without aliases. 

I have described filtering as integration using different 
weighting functions; an antialiasing filter performs the 
integration using analog circuitry. 

An analog filter performs integration by storing 
a magnetic field in an inductor (coil) using the elec- 
trical property of inductance (L), and/or by storing an 
electrical charge in a capacitor using the electrical prop- 
erty of capacitance (C). In low- performance filters, resis- 
tance (R) is used as well. An ordinary analog filter has 
an impulse response that is infinite in temporal extent. 

The design of analog filters is best left to specialists. 

Digital filters 

Once digitized, a signal can be filtered directly in the 
digital domain. Design and implementation of such 
filters - in hardware, firmware, or software - is the 
domain of digital signal processing (DSP). Filters like the 
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Figure 16.9 Fourier transform pairs for several functions are shown in these graphs. In the left 
column is a set of waveforms in the time domain; beside each waveform is its frequency spectrum. 
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Figure 16.10 Waveforms of three temporal extents are shown on the left; the corresponding 
transforms are shown on the right. Spectral width is inversely proportional to temporal extent, not 
only for the Gaussians shown here, but for all waveforms. 



ones that I have been describing are implemented digi- 
tally by computing weighted sums of samples. 

Averaging neighboring samples Perhaps the simplest digital filter is one that just sums 

is the simplest form of moving adjacent samples; the weights in this case are [1, 1], 

average (AAA) filter. _. . _ . _ . . , . , . . , 

Figure 16.12 on the facing page shows the frequency 
response of such a [1, 1] filter. This filter offers minimal 
attenuation to very low frequencies; as signal frequency 
approaches half the sampling rate, the response follows 
a cosine curve to zero. This is a very simple, very cheap 
tow pass filter (LPF). 

I have drawn in gray the filter's response from 0.5 f s to 
the sampling frequency. In a digital filter, frequencies in 
this region are indistinguishable from frequencies 
between 0.5 f s and 0. The gain of this filter at zero 
frequency (DC) is 2, the sum of its coefficients. 
Normally, the coefficients of such a filter are normal- 
ized to sum to unity, so that the overall DC gain of the 
filter is one. In this case the normalized coefficients 
would be [V2, -V2]. Plowever, it is inconvenient to call 
this a [ V2 , -V2]-filter; colloquially, this is a [1, 1]-filter. 
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Figure 16.12 [1, 1] FIR 
filter sums two adjacent 
samples; this forms a simple 
lowpass filter. I'll introduce 
the term FIR on page 157. 



Figure 16.13 [1, -1] FIR 
filter subtracts one sample 
from the previous sample; 
this forms a simple high- 
pass filter. 



Figure 16.14 [1, 0, 1] FIR 
filter averages a sample and 
the second preceding 
sample, ignoring the sample 
in between; this forms 
a bandreject ("notch," or 
"trap") filter at 0.25 / s . 



Figure 16.15 [1, 0, -1] FIR 

filter subtracts one sample 
from the second previous 
sample, ignoring the sample 
in between; this forms 
a bandpass filter centered at 

025/5. 







Digital filters can be implemented in software, firm- 
ware, or hardware. At the right side of each graph 
above, I show the block diagrams familiar to hardware 
designers. Each block labelled R designates a register; 
a series of these elements forms a shift register. 

A simple highpass filter (HPF) is formed by subtracting 
each sample from the previous sample: This filter has 
weights [1 , -1]. The response of this filter is graphed in 
Figure 16.13. In general, and in this case, a highpass 
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Figure 16.16 Block diagram in 
of 5-tap FIR filter 

comprises four registers and 
an adder; five adjacent 
samples are summed. Prior 
to scaling to unity, the coef- 
ficients are [1, 1, 1, 1, 1], 



A bandpass (bandstop) filter is 
considered narrowband if its 
passband (stopband) covers an 
octave or less. (See page 19.) 



If a filter like that of Figure 1 6.1 6 has 
many taps, it needs many adders. Its 
arithmetic can be simplified by using 
an accumulator to form the running 
sum of input samples, another accu- 
mulator to form the running sum of 
outputs from the shift register, and 
a subtractor to take the difference of 
these sums. This structure is called 
a cascaded integrator comb (CIC). 




filter is obtained when a lowpass-filtered version of 
a signal is subtracted from the unfiltered signal. The 
unfiltered signal can be considered as a two-tap filter 
having weights [1, 0]. Subtracting the weights [V2, V2] 
of the scaled lowpass filter from that yields the scaled 
weights [V2, - V2] of this highpass filter. 

Figure 16.14 shows the response of a filter that adds 
a sample to the second previous sample, disregarding 
the central sample. The weights in this case are [1, 0, 1]. 
This forms a simple band reject filter (BRF), also known 
as a bandstop or notch filter, or trap. Here, the response 
has a null at one quarter the sampling frequency. The 
scaled filter passes DC with no attenuation. This filter 
would make a mess of image data - if a picket fence 
whose pickets happened to lie at a frequency of 0.25 f s 
were processed through this filter, the pickets would 
average together and disappear! It is a bad idea to 
apply such a filter to image data, but this filter (and 
filters like it) can be very useful for signal processing 
functions. 

Figure 1 6.1 5 shows the response of a filter that 
subtracts a sample from the second previous sample, 
disregarding the central sample. Its weights are 
[1 , 0, -1]. This forms a simple bandpass filter (BPF). 

The weights sum to zero - this filter blocks DC. The BPF 
of this example is complementary to the [1,0,1] filter. 

Figure 16.16 above shows the block diagram of a 5-tap 
FIR filter that sums five successive samples. As shown in 
the light gray curve in Figure 16.17 at the top of the 
facing page, this yields a lowpass filter. Its frequency 
response has two zeros: Any input signal at 0.2 f s or 
0.4/ s will vanish; attenuation in the stopband reaches 
only about -12 d B, at 3 / 10 of the sampling rate. 
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Figure 16.17 5-tap 
FIR filter responses 

are shown for several 
choices of coefficient 
values (tap weights). 




In the design of digital filters, control of frequency 
response is exercised in the choice of tap weights. 
Figure 16.18 below shows the block diagram of a filter 
having fractional coefficients chosen from a Gaussian 
waveform. The mid-gray curve in Figure 16.17 shows 
that this set of tap weights yields a lowpass filter having 
a Gaussian frequency response. By using negative coef- 
ficients, low-frequency response can be extended 
without deteriorating performance at high frequencies. 
The black curve in Figure 16.17 shows the response of 
a filter having coefficients [“ 32 / 256 . 72/ 256 ' 176/ 256 ' 

72/ 256 > ~ 3 ^ 256 ]- This filter exhibits the same attenuation 
at high frequencies (about -18 dB) as the Gaussian, but 
has about twice the -6 dB frequency. 

Negative coefficients, as in the last example here, 
potentially cause production of output samples that 
exceed unity. (In this example, output samples above 
unity are produced at input frequencies about oj=0.3n, 



Figure 16.18 5-tap FIR 
filter including multipliers 

has coefficients [13, 56, 

118, 56, 13], scaled by V 256 . 
The coefficients approximate 
a Gaussian; so does the 
frequency response. The 
multipliers can be imple- 
mented by table lookup. 
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Figure 16.19 Comb 
filter block diagram 

includes several delay 
elements and an adder. 




— OUT 



Vg the sampling rate). If extreme values are clipped, 
artifacts will result. To avoid artifacts, the signal coding 
range must include suitable footroom and headroom. 



For details concerning imple- 
mentation structures, see the 
books by Lyons and Rorabaugh 
cited on page 170. 



The operation of an FIR filter amounts to multiplying 
a set of input samples by a set of filter coefficients 
(weights), and forming the appropriate set of sums of 
these products. The weighting can be implemented 
using multipliers by or using table lookup techniques. 
With respect to a complete set of input samples, this 
operation is called convolution. Ordinarily, convolution 
is conceptualized as taking place one multiplication at 
a time. An /i-tap FIR filter can be implemented using 
a single multiplier-accumulator (MAC) component 
operating at n times the sample rate. A direct imple- 
mentation with n multiplier components, or 
a multiplexed implementation with a single MAC, 
accepts input samples and delivers output samples in 
temporal order: Each coefficient needs to be presented 
to the filter n times. Flowever, convolution is symmet- 
rical with respect to input samples and coefficients: The 
same set of results can be produced by presenting filter 
coefficients one at a time to a MAC, and accumulating 
partial output sums for each output sample. FIR filters 
have many potential implementation structures. 



Figure 16.19 above shows the block diagram of an FIR 
filter having eight taps weighted [1, 0, 0, ..., 0, 1]. 

The frequency response of this filter is shown in 
Figure 16.20 at the top of the facing page. The 
response peaks when an exact integer number of cycles 
lie underneath the filter; it nulls when an integer-and-a- 
half cycles lie underneath. The peaks all have the same 
magnitude: The response is the same when exactly 1, 

2, ..., or n samples are within its window. The magni- 
tude frequency response of such a filter has a shape 
resembling a comb, and the filter is called a comb filter. 
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Figure 16.20 Comb filter 
response resembles the 
teeth of a comb. This filter 
has unity response at zero 
frequency: It passes DC. 

A filter having weights 

[V 2 , 0, 0, .... 0, -V 2 ] 

blocks DC. 



For details of the relationship 
between the Dirac delta, the 
Kronecker delta, and sampling in 
DSP, see page 122 of Rorabaugh's 
book, cited on page 170. 
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Impulse response 

I have explained filtering as weighted integration along 
the time axis. I coined the term temporal weighting 
function to denote the weights. I consider my explana- 
tion of filtering in terms of its operation in the temporal 
domain to be more intuitive to a digital technologist 
than a more conventional explanation that starts in the 
frequency domain. But my term temporal weighting 
function is nonstandard, and I must now introduce the 
usual but nonintuitive term impulse response. 

An analog impulse signal has infinitesimal duration, infi- 
nite amplitude, and an integral of unity. (An analog 
impulse is conceptually equivalent to the Dirac or 
Kronecker deltas of mathematics.) A digital impulse 
signal is a solitary sample having unity amplitude amid 
a stream of zeros; The impulse response of a digital filter 
is its response to an input that is identically zero except 
for a solitary unity-valued sample. 

Finite impulse response (FIR) filters 

In each of the filters that I have described so far, only 
a few coefficients are nonzero. When a digital impulse 
is presented to such a filter, the result is simply the 
weighting coefficients scanned out in turn. The 
response to an impulse is limited in duration; the exam- 
ples that I have described have finite impulse response. 
They are FIR filters. In these filters, the impulse 
response is identical to the set of coefficients. The 
digital filters that I described on page 150 implement 
temporal weighting directly. The impulse responses of 
these filters, scaled to unity, are [V2, Vfi, [V2, -V2], 

[V2, 0 , V2], and [V2, 0 , -V2], respectively. 
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In Equation 16.2, g is a sequence 
(whose index is enclosed in square 
brackets), not a function (whose 
argument would be in paren- 
theses). Sj is sample number j. 



Eq 16.2 



Symmetry: 

/H -/(-*) 



Antisymmetry: 



Here I use the word truncation to 
indicate the forcing to zero of a 
filter's weighting function beyond 
a certain tap. The nonzero coeffi- 
cients in a weighting function may 
involve theoretical values that have 
been quantized to a certain 
number of bits. This coefficient 
quantization can be accomplished 
by rounding or by truncation. Be 
careful to distinguish between 
truncation of impulse response and 
truncation of coefficients. 



The particular set of weights in Figure 16.18 approxi- 
mate a sampled Gaussian waveform; so, the frequency 
response of this filter is approximately Gaussian. The 
action of this filter can be expressed algebraically: 

r .] 13 56 118 56 13 

“ 256 Sj ~ 2 + 256 Sj ~ 1 + 256 Sj + 256 Sy+1 + 256 Sj+2 

I have described impulse responses that are symmet- 
rical around an instant in time. You might think t = 0 
should denote the beginning of time, but it is usually 
convenient to shift the time axis so that t = 0 corre- 
sponds to the central point of a filter's impulse 
response. A FIR (or nonrecursive) filter has a limited 
number of coefficients that are nonzero. When the 
input impulse lies outside this interval, the response is 
zero. Most digital filters used in video are FIR filters, 
and most have impulse responses either symmetric or 
antisymmetric around t = 0. 

You can view an FIR filter as having a fixed structure, 
with the data shifting along underneath. Alternatively, 
you might think of the data as being fixed, and the filter 
sliding across the data. Both notions are equivalent. 

Physical realizability of a filter 

In order to be implemented, a digital filter must be 
physically realizable: It is a practical necessity to have 
a temporal weighting function (impulse response) of 
limited duration. An FIR filter requires storage of several 
input samples, and it requires several multiplication 
operations to be performed during each sample period. 
The number of input samples stored is called the order 
of the filter, or its number of taps. If a particular filter 
has fixed coefficients, then its multiplications can be 
performed by table lookup. A straightforward tech- 
nique can be used to exploit the symmetry of the 
impulse response to eliminate half the multiplications; 
this is often advantageous! 

When a temporal weighting function is truncated past 
a certain point, its transform - its frequency response 
characteristics - will suffer. The science and craft of filter 
design involves carefully choosing the order of the 
filter - that is, the position beyond which the weighting 
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125 ns, 45° at 1 MHz 




125 ns, 90° at 2 MHz 
Figure 16.21 Linear phase 
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function is forced to zero. That position needs to be far 
enough from the center tap that the filter's high- 
frequency response is small enough to be negligible for 
the application. 

Signal processing accommodates the use of impulse 
responses having negative values, and negative coeffi- 
cients are common in digital signal processing. But 
image capture and image display involve sensing and 
generating light, which cannot have negative power, so 
negative weights cannot always be realized. If you study 
the transform pairs on page 151 you will see that your 
ability to tailor the frequency response of a filter is 
severely limited when you cannot use negative weights. 

Impulse response is generally directly evident in the 
design of an FIR digital filter. Although it is possible to 
implement a boxcar filter directly in the analog domain, 
analog filters rarely implement temporal weighting 
directly, and the implementation of an analog filter 
generally bears a nonobvious relationship to its impulse 
response. Analog filters are best described in terms of 
Laplace transforms, not Fourier transforms. Impulse 
responses of analog filters are rarely considered directly 
in the design process. Despite the major conceptual 
and implementation differences, analog filters and FIR 
filters - and HR filters, to be described - are all charac- 
terized by their frequency response. 

Phase response (group delay) 

Until now I have described the magnitude frequency 
response of filters. Phase frequency response - often 
called phase response - is also important. Consider 
asymmetrical FIR filter having 15 taps. No matter what 
the input signal, the output will have an effective delay 
of 8 sample periods, corresponding to the central 
sample of the filter's impulse response. The time delay 
of an FIR filter is constant, independent of frequency. 

Consider a sine wave at 1 MFIz, and a second sine wave 
at 1 MFIz but delayed 125 ns. The situation is sketched 
in Figure 16.21 in the margin. The 125 ns delay could 
be expressed as a phase shift of 45° at 1 MFIz. However, 
if the time delay remains constant and the frequency 
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What a signal processing engi- 
neer calls an MR filter is known 
in the finance and statistics 
communities as autoregressive 
moving average (ARMA). 
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doubles, the phase offset doubles to 90°. With constant 
time delay, phase offset increases in direct (linear) 
proportion to the increase in frequency. Since in this 
condition phase delay is directly proportional to 
frequency, its synonym is linear phase. A closely related 
condition is constant group delay, where the first deriva- 
tive of delay is constant but a fixed time delay may be 
present. All FIR filters exhibit constant group delay, but 
only symmetric FIR filters exhibit strictly linear phase. 

It is characteristic of many filters - such as HR filters, to 
be described in a moment - that delay varies some- 
what as a function of frequency. An image signal 
contains many frequencies, produced by scene 
elements at different scales. If the horizontal displace- 
ment of a reproduced object were dependent upon 
frequency, objectionable artifacts would result. 
Symmetric FIR filters exhibit linear phase in their pass- 
bands, and avoid this artifact. So, in image processing 
and in video, FIR filters are strongly preferred over 
other sorts of filters: Linear phase is a highly desirable 
property in a video system. 

Infinite impulse response (MR) filters 

The digital filters described so far have been members 
of the FIR class. A second class of digital filter is charac- 
terized by having a potentially infinite impulse response 
(HR). An HR (or recursive) filter computes a weighted 
sum of input samples - as is the case in an FIR filter - 
but adds to this a weighted sum of previous output 
samples. 

A simple HR is sketched in Figure 16.22: The input 
sample is weighted by V4, and the previous output is 
weighted by 3 / 4 . These weighted values are summed to 
form the filter result. The filter result is then fed back to 
become an input to the computation of the next 
sample. The impulse response jumps rapidly upon the 
onset of the input impulse, and tails off over many 
samples. This is a simple one-tap lowpass filter; its 
time-domain response closely resembles an analog RC 
lowpass filter. A highpass filter is formed by taking the 
difference of the input sample from the previously 
stored filter result. 
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Compensation of undesired 
phase response in a filter is 
known as equalization. This is 
unrelated to the equalization 
pulses that form part of sync. 



The terms nonrecursive and recur- 
sive are best used to describe filter 
implementation structures. 
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Figure 16.22 MR ("recursive") filter computes a weighted 
sum of input samples (here, just V 4 times the current sample), 
and adds to this a weighted sum of previous result samples. 
Every HR filter exhibits nonlinear phase response. 

In an HR filter having just one tap, the designer's ability 
to tailor frequency response is severely limited. An HR 
filter can be extended by storing several previous filter 
results, and adding (or subtracting) a fraction of each to 
a fraction of the current input sample. In such a multi- 
tap HR filter, a fine degree of control can be exercised 
over frequency response using just a handful of taps. 
Just three or four taps in an HR filter can achieve frequ- 
ency response that might take 20 taps in an FIR filter. 

Flowever, there's a catch: In an HR filter, both attenua- 
tion and delay depend upon frequency. In the termi- 
nology of the previous section, an HR filter exhibits 
nonlinear phase. Typically, low-frequency signals are 
delayed more than high-frequency signals. As I have 
explained, variation of delay as a function of frequency 
is potentially a very serious problem in video. 

An HR filter cannot have exactly linear phase, although 
a complex HR filter can be designed to have arbitrarily 
small phase error. Because HR filters usually have poor 
phase response, they are not ordinarily used in video. 

(A notable exception is the use of field- and frame- 
based HR filters in temporal noise reduction, where the 
delay element comprises a field or frame of storage.) 

Owing to the dependence of an HR filter's result upon 
its previous results, an HR filter is necessarily recursive. 
Flowever, certain recursive filters have finite impulse 
response, so a recursive filter does not necessarily have 
infinite impulse response. 
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Figure 16.23 Lowpass filter characterization. A lowpass filter for use in video sampling or recon- 
struction has a corner frequency oj c , where the attenuation is 0.707. (At the corner frequency, 
output power is half the input power.) In the passband, response is unity within 5 P , usually 1% or 
so. In the stopband, response is zero within 5$, usually 1% or so. The transition band lies between 
the edge of the passband and the edge of the stopband; its width is Aco. 



Here I represent frequency by 
the symbol oj, whose units are 
radians per second (rad-s -1 ). 

A digital filter scales with its 
sampling frequency; using cu is 
convenient because the 
sampling frequency is always 
co= 2n and the half-sampling 
(Nyquist) frequency is always n. 

Some people define band- 
width differently than I do. 



Lowpass filter 

A lowpass filter lets low frequencies pass undisturbed, 
but attenuates high frequencies. Figure 16.23 above 
characterizes a lowpass filter. The response has 
a passband, where the filter's response is nearly unity; 
a transition band, where the response has intermediate 
values; and a stopband, where the filter's response is 
nearly zero. For a lowpass filter, the corner frequency, 
coq - sometimes called bandwidth, or cutoff frequency - 
is the frequency where the magnitude response of the 
filter has fallen 3 dB from its magnitude at a reference 
frequency (usually zero, or DC). In other words, at its 
corner frequency, the filter's response has fallen to 
0.707 of its response at DC. 

The passband is characterized by the passband edge 
frequency oj p and the passband ripple 5 P (sometimes 
denoted 5-|). The stopband is characterized by its edge 
frequency oj s and ripple 5 S (sometimes denoted 5 2 ). 
The transition band lies between oj p and oj s ; it has 
width Aw. 
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Eq 16.3 

Bellanger, Maurice, Digital 
Processing of Signals: Theory 
and Practice, Third Edition 
(Chichester, England: Wiley, 
2000). 124. 



The complexity of a lowpass filter is roughly deter- 
mined by its relative transition bandwidth (or transition 
ratio) Au)/(jJ s . The narrower the transition band, the 
more complex the filter. Also, the smaller the ripple in 
either the passband or the stopband, the more complex 
the filter. FIR filter tap count can be estimated by this 
formula, due to Bellanger: 



In analog filter design, frequency response is generally 
graphed in log-log coordinates, with the frequency axis 
in units of log hertz (Hz), and magnitude response in 
decibels (dB). In digital filter design, frequency is usually 
graphed linearly from zero to half the sampling 
frequency. The passband and stopband response of 
a digital filter are usually graphed logarithmically; the 
passband response is often magnified to emphasize 
small departures from unity. 



A/p 



w, 



: —~lg 
Aoi 3 ° 



I describe risetime on page 543. 

In response to a step input, 
a Gaussian filter has a risetime 
very close to Vj of the period of 
one cycle at the corner frequency. 



The templates standardized in Rec. 601 for a studio 
digital video presampling filter are shown in 
Figure 16.24 overleaf. The response of a practical 
lowpass filter meeting this tremplate is shown in 
Figure 16.25, on page 166. This is a half-band filter, 
intended for use with a sampling frequency of 27 MFIz; 
its corner frequency is 0.25 i/ s . A consumer filter might 
have ripple two orders of magnitude worse than this. 

Digital filter design 

A simple way to design a digital filter is to use coeffi- 
cients that comprise an appropriate number of point- 
samples of a theoretical impulse response. Coefficients 
beyond a certain point - the order of the filter - are 
simply omitted. Equation 16.4 implements a 9-tap filter 
that approximates a Gaussian: 



Eq 16.4 



1 Sj 4 + 9Sy_ 3 + 43s j_ 2 +110 Sj—'i +150 Sj + 110 -C+l ' 43s j + 2 + 9Sy+ 3 + 1s^ + 4 

476 

Omission of coefficients causes frequency response to 
depart from the ideal. If the omitted coefficients are 
much greater than zero, actual frequency response can 
depart significantly from the ideal. 
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Frequency, MHz 



Figure 16.24 Rec. 601 filter templates are standardized for studio digital video systems in 
Rec. 601-5. The top template shows frequency response, detailing the passband (at the top) and 
the stopband (in the middle). The bottom template shows the group delay specification. 
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Another approach to digital filter design starts with the 
ILFP. Its infinite extent can be addressed by simply trun- 
cating the weights - that is, forcing the weights to 
zero - outside a certain interval, say outside the region 
0±4 sample periods. This will have an unfortunate 
effect on the frequency response, however: The 
frequency response will exhibit overshoot and under- 
shoot near the transition band. 



We could use the term weighting, 
but sine itself is a weighting func- 
tion, so we choose a different word: 
windowing. 



For details about windowing, see 
Lyons or Rorabaugh, cited on 
page 170, orWolberg, George, 
Digital Image Warping (Los 
Alamitos, Calif.: IEEE, 1990). 



Poor spectral behavior of a truncated sine can be miti- 
gated by applying a weighting function that peaks at 
unity at the center of the filter and diminishes gently to 
zero at the extremities of the interval. This is referred to 
as applying a windowing function. Design of a filter 
using the windowing method begins with scaling of sine 
along the time axis to choose the corner frequency and 
choosing a suitable number of taps. Each tap weight is 
then computed as a sine value multiplied by the corre- 
sponding window value. A sine can be truncated 
through multiplication by a rectangular window. 

Perhaps the simplest nontrivial window has a triangular 
shape; this is also called the Bartlett window. The von 
Hann window (often wrongly called "Planning") has 
a windowing function that is single cycle of a raised 
cosine. Window functions such as von Plann are fixed 
by the corner frequency and the number of filter taps; 
no control can be exercised over the width of the tran- 
sition band. The Kaiser window has a single parameter 
that controls that width. For a given filter order, if the 
transition band is made narrower, then stopband atten- 
uation is reduced. The Kaiser window parameter allows 
the designer to determine this tradeoff. 



A windowed sine filter has much better performance 
than a truncated sine, and windowed design is so 
simple that there is no excuse to use sine without 
windowing. In most engineering applications, however, 
filter performance is best characterized in the frequency 
domain, and the frequency-domain performance of 
windowed sine filters is suboptimal: The performance of 
an /i-tap windowed sine filter can be bettered by an 
n-tap filter whose design has been suitably optimized. 
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Figure 16.25 Half-band filter. This graph shows the frequency response of a practical filter whose 
corner is at one-quarter its sampling frequency of 27 MHz. The graph is linear in the abscissa 
(frequency) and logarithmic in the ordinate (response). The top portion shows that the passband 
has an overall gain of unity and a uniformity ( ripple ) of about ±0.02 dB: In the passband, its gain 
varies between about 0.997 and 1.003. The bottom portion shows that the stopband is rejected 
with an attenuation of about -60 dB: The filter has a gain of about 0.001 at these frequencies. 
This data, for the GF9102A halfband filter, was kindly provided by Gennum Corporation. 
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Figure 16.26 FIR filter 
example, 25-tap lowpass 



g[/]= 0. 098460 S/_-| 2 
+0.009482s;_ 11 
-0.013681s,_ 10 
+0.020420 s,_ 9 
-0.0291 97 s,_ 8 
+ 0 . 039309 s /_ 7 
-0.050479 S/.g 
+0.061 500 s,_ 5 
-0.071 781 s /_ 4 
+0.08061 2 S /_ 3 
-0.087404 Sj _ 2 
+0.091 742 s / _ 1 
+0.906788 s; 
+0.091 742 s /+1 
-0.087404 s i+2 
+0.08061 2s /+ 3 
-0.071 781s /+ 4 
+0.061 500 S / +5 
-0.050479s /+6 
+0.039309 s /+ 7 
-0.0291 97 s /+8 
+0.020420 s i+9 
-0.013681 s /+10 
+0.009482 Sy_|_'| ^ 

+0.098460 s /+ '] 2 



Few closed-form methods are known to design 
optimum digital filters. Design of a high-performance 
filter usually involves successive approximation, opti- 
mizing by trading design parameters back and forth 
between the time and frequency domains. The classic 
method was published byJ.H. McLellan, T.W. Parks, 
and L.R. Rabiner ("MPR"), based upon an algorithm 
developed by the Russian mathematician E.Y. Remez. In 
the DSP community, the method is often called the 
"Remez exchange." 

The coefficients of a high-quality lowpass filter for 
studio video are shown in Figure 16.26 in the margin. 

Reconstruction 

Digitization involves sampling and quantization; these 
operations are performed in an analog-to-digital 
converter (ADC). Whether the signal is quantized then 
sampled, or sampled then quantized, is relevant only 
within the ADC: The order of operations is immaterial 
outside that subsystem. Modern video ADCs quantize 
first, then sample. 

I have explained that filtering is generally required prior 
to sampling in order to avoid the introduction of 
aliases. Avoidance of aliasing in the sampled domain 
has obvious importance. In order to avoid aliasing, an 
analog presampling filter needs to operate prior to 
analog-to-digital conversion. If aliasing is avoided, then 
the sampled signal can, according to Shannon's 
theorem, be reconstructed without aliases. 

To reconstruct an analog signal, an analog reconstruc- 
tion filter is necessary following digital-to-analog 
(D-to-A) conversion. The overall flow is sketched in 
Figure 16.27. 



Figure 16.27 Sampling 
and reconstruction 
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Figure 16.28 Reconstruction 

close to 0.5/ s °1 1 1 1 1 r 
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Reconstruction close to 0.5/ s 

Consider the example in Figure 16.28 of a sine wave at 
0.44/ s . This signal meets the sampling criterion, and 
can be perfectly represented in the digital domain. 
However, from an intuitive point of view, it is difficult 
to predict the underlying sinewave from samples 3, 4, 

5, and 6 in the lower graph. When reconstructed using 
a Gaussian filter, the high-frequency signal vanishes. To 
be reconstructed accurately, a waveform with 
a significant amount of power near half the sampling 
rate must be reconstructed with a high-quality filter. 

(sin x)/x correction 

I have described how it is necessary for an analog 
reconstruction filter to follow digital-to-analog conver- 
sion. If the DAC produced an impulse "train" where the 
amplitude of each impulse was modulated by the corre- 
sponding code value, a classic lowpass filter would 
suffice: All would be well if the DAC output resembled 
my "point" graphs, with power at the sample instants 
and no power in between. Recall that a waveform 
comprising just unit impulses has uniform frequency 
response across the entire spectrum. 

Unfortunately for analog reconstruction, a typical DAC 
does not produce an impulse waveform for each 
sample. It would be impractical to have a DAC with an 
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Figure 1 6.29 D-to-A conver- 1 - 

sion with boxcar waveform is ]■ 

equivalent to a DAC producing 
an impulse train followed by o. 5 - 
a boxcar filter with its (sin x)/x 
response. Frequencies close to 
O. 5/5 are attenuated. 0 



You might consider a DAC's boxcar 
waveform to be a “sample-and- 
hold" operation, but that term is 
normally used in conjunction with 
an A-to-D converter, or circuitry 
that lies in front of an ADC. 



impulse response, because signal power is proportional 
to the integral of the signal, and the amplitude of the 
impulses would have to be impractically high for the 
integral of the impulses to achieve adequate signal 
power. Instead, each converted sample value is held for 
the entire duration of the sample: Atypical DAC 
produces a boxcar waveform. A boxcar waveform's 
frequency response is described by the sine function. 



In Figure 16.29 above, the top graph is a sine wave at 
0.44/s ; the bottom graph shows the boxcar waveform 
produced by a conventional DAC. Even with a high- 
quality reconstruction filter, whose response extends 
close to half the sampling rate, it is evident that recon- 
struction by a boxcar function reduces the magnitude of 
high-frequency components of the signal. 



The DAC's holding of each sample value throughout the 
duration of its sample interval corresponds to a filtering 
operation, with a frequency response of (sin x)/x. The 
top graph of Figure 16.30 overleaf shows the attenua- 
tion due to this phenomenon. 



The effect is overcome by (sin x)/x correction: The 
frequency response of the reconstruction filter is modi- 
fied to include peaking corresponding to the reciprocal 
of (sin x)/x. In the passband, the filter's response 
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Figure 16.30 (sin x)/x correction is necessary following (or in principle, preceding) digital-to- 
analog conversion when a DAC with a typical boxcar output waveform is used. The frequency 
response of a boxcar-waveform DAC is shown in the upper graph. The lower graph shows the 
response of the (sin x)/x correction filter necessary to compensate its high frequency falloff. 



increases gradually to about 4 dB above its response at 
DC, to compensate the loss. Above the passband edge 
frequency, the response of the filter must decrease 
rapidly to produce a large attenuation near half the 
sampling frequency, to provide alias-free reconstruction. 
The bottom graph of Figure 16.30 shows the idealized 
response of a filter having (sin x)/x correction. 



This chapter has detailed one-dimensional filtering. In 
Image digitization and reconstruction, I will introduce 
two- and three-dimensional sampling and filters. 



Lyons, Richard G., Understanding 
Digital Signal Processing (Reading, 
Mass.: Addison Wesley, 1997). 

Rorabaugh, C. Britton, DSP Primer 
(New York: McGraw-Hill, 1999). 

Mitra, Sanjit K., and James F. Kaiser, 
Handbook for Digital Signal 
Processing (New York: Wiley, 1993). 



Further reading 

For an approachable introduction to the concepts, 
theory, and mathematics of digital signal processing 
(DSP), see Lyons. For an alternative point of view, see 
Rorabaugh's book; it includes the source code for 
programs to design filters - that is, to evaluate filter 
coefficients. For comprehensive and theoretical 
coverage of DSP, see Mitra and Kaiser. 
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In video and audio signal processing, it is often neces- 
sary to take a set of sample values and produce another 
set that approximates the samples that would have 
resulted had the original sampling occurred at different 
instants - at a different rate, or at a different phase. 

This is called resampling. (In PC parlance, resampling 
for the purpose of picture resizing is called scaling.) 
Resampling is an essential part of video processes such 
as these: 

• Chroma subsampling (e.g., 4:4:4 to 4:2:2) 

• Downconversion (e.g., HDTV to SDTV) and upconver- 
sion (e.g., SDTV to HDTV) 

• Aspect ratio conversion (e.g., 4:3 to 16:9) 

• Conversion among different sample rates of digital 
video standards (e.g., 4/ sc to 4:2:2, 13.5 MHz) 

• Picture resizing in digital video effects (DVE) 

One-dimensional resampling applies directly to digital 
audio, in applications such as changing sample rate 
from 48 kHz to 44.1 kHz. In video, 1-D resampling can 
be applied horizontally or vertically. Resampling can be 
extended to a two-dimensional array of samples. Two 
approaches are possible. A horizontal filter, then 
a vertical filter, can be applied in cascade (tandem) - 
this is the separable approach. Alternatively, a direct 
form of 2-D spatial interpolation can be implemented. 
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I write resampling ratios in the 
form input samples.output samples. 
With my convention, a ratio less 
than unity is upsampling. 



Upsampling produces more result samples than input 
samples. In audio, new samples can be estimated at 
a higher rate than the input, for example when digital 
audio sampled at 44.1 kHz is converted to the 48 kHz 
professional rate used with video. In video, upsampling 
is required in the spatial upconversion from 1280x720 
HDTV to 1920x1080 HDTV: 1280 samples in each 
input line must be converted to 1920 samples in the 
output, an upsampling ratio of 2:3. 

One way to accomplish upsampling by an integer ratio 
of 1: n is to interpose n - 1 zero samples between each 
pair of input samples. This causes the spectrum of the 
original signal to repeat at multiples of the original 
sampling rate. The repeated spectra are called "images." 
(This is a historical term stemming from radio; it has 
nothing to do with pictures!) These "images" are then 
eliminated (or at least attenuated) by an anti-imaging 
lowpass filter. In some upsampling structures, such as 
the Langrange interpolator that I will describe later in 
this chapter, filtering and upsampling are intertwined. 

Downsampling produces fewer result samples than 
input samples. In audio, new samples can be created at 
a lower rate than the input. In video, downsampling is 
required when converting 4/ sc NTSC digital video to 
Rec. 601 ("4:2:2") digital video: 910 samples in each 
input line must be converted to 858 samples in the 
output, a downsampling ratio of 35:33; for each 35 
input samples, 33 output samples are produced. 

In an original sample sequence, signal content from DC 
to nearly 0.5 f s can be represented. After downsam- 
pling, though, the new sample rate may be lower than 
that required by the signal bandwidth. After downsam- 
pling, meaningful signal content is limited by the 
Nyquist criterion at the new sampling rate - for 
example, after 4:1 downsampling, signal content is 
limited to V 8 of the original sampling rate. To avoid the 
introduction of aliases, lowpass filtering is necessary 
prior to, or in conjunction with, downsampling. The 
corner frequency depends upon the downsampling 
ratio; for example, a 4:1 ratio requires a corner less than 
0.125/ s . Downsampling with an integer ratio of n:1 
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Figure 17.1 Two-times 
upsampling starts by inter- 
posing zero samples between 
original sample pairs. This 
would result in the folded 
spectral content of the original 
signal appearing in-band at the 
new rate. These "images" are 
removed by a resampling filter. 




Folded spectrum ("image") prior to 
resampling (anti-imaging) filter 



Folded spectrum - 
following resampling filter 



0.5 



1.0 



Frequency, 1 :2-upsampled/ s 




Figure 17.2 Original signal 

exhibits folding around half the 
sampling frequency. This is 
inconsequential providing that 
the signal is properly recon- 
structed. When the signal is 
upsampled or downsampled, 
the folded portion must be 
handled properly or aliasing 
will result. 




1 



Folding around 
half-sampling frequency 



1.0 Frequency, original / s 






DOWNSAMPLING 



Figure 17.3 Two-to-one down- 
sampling requires a resampling 
filter to meet the Nyquist 
criterion at the new sampling 
rate. The solid line shows the 
spectrum of the filtered signal; 
the gray line shows its folded 
portion. Resampling without 
filtering would preserve the 
original baseband spectrum, 
but folding around the new 
sampling rate would cause alias 
products shown here in the 
crosshatched region. 




0 0.5 1 



2:1 -downsampled 
signal 



Folded spectrum 
Alias products 
Signal spectrum 



without resampling 
(anti-aliasing) filter 



Frequency, 2:1-downsampled/ s 



can be thought of as prefiltering (antialias filtering) for 
the new sampling rate, followed by the discarding of 
n - 1 samples between original sample pairs. 



Figure 1 7.2, at the center above, sketches the spectrum 
of an original signal. Figure 17.1 shows the frequency 
domain considerations of upsampling; Figure 1 7.3 
shows the frequency domain considerations of down- 
sampling. These examples show ratios of 1:2 and 2:1, 
but these concepts apply to resampling at any ratio. 
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2:1 downsampling 

Color video originates with R'G'B' components. 
Transcoding to V"C B C R is necessary if signals are to be 
used in the studio. The conversion involves matrixing 
(to V"C B C R in 4:4:4 form), then chroma subsampling to 
4:2:2. Chroma subsampling requires a 2:1 downsam- 
pler. If this downsampling is attempted by simply drop- 
ping alternate samples, any signal content between the 
original 0.25 f s and 0.5 f s will cause aliasing in the 
result. Rejection of signal content at and above 0.25 f s 
is required. The required filter is usually implemented as 
an FIR lowpass filter having its corner frequency some- 
what less than one-quarter of the (original) sampling 
frequency. After filtering, alternate result samples can 
be dropped. There is no need to calculate values that 
will subsequently be discarded, however! Efficient 
chroma subsamplers take advantage of that fact, inter- 
leaving the C B and C R components into a single filter. 

In Figure 16.12, on page 153, I presented a very simple 
lowpass filter that simply averages two adjacent 
samples. That filter has a corner frequency of 0.25 f s . 
Flowever, it makes a slow transition from passband to 
stopband, and it has very poor attenuation in the stop- 
band (above 0.25 f s ). It makes a poor resampling filter. 
More than two taps are required to give adequate 
performance in studio video subsampling. 

In 4:2:2 video, chroma is cosited: Each chroma sample 
must be located at the site of a luma sample. A sym- 
metrical filter having an even number of (nonzero) taps 
does not have this property. A downsampling filter for 
cosited chroma must have an odd number of taps. 

Oversampling 

I have explained the importance of prefiltering prior to 
A-to-D conversion, and of postfiltering following 
D-to-A conversion. H istorically, these filters were 
implemented in the analog domain, using inductors and 
capacitors. In discrete form, these components are 
bulky and expensive. It is extremely difficult to incorpo- 
rate inductive and capacitive elements with suitable 
values and precision onto integrated circuits. Flowever, 
A-to-D and D-to-A converters are operating at higher 
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Figure 17.4 Analog filter 
for direct sampling must 
meet tight constraints, 
making it expensive. 



Figure 17.5 Analog filter 
for 2x-oversampling is 

much less demanding than 
a filter for direct sampling, 
because the difficult part of 
filtering - achieving a 
response comparable to 
that of Figure 1 7.4 - is rele- 
gated to the digital domain. 




and higher rates, and digital arithmetic has become very 
inexpensive. These circumstances have led to the emer- 
gence of oversampling as an economical alternative to 
complex analog presampling ("antialiasing") and post- 
sampling (reconstruction) filters. 



For an explanation of transition 
ratio, see page 1 63. 



The characteristics of a conventional analog presam- 
pling filter are critical: Attenuation must be quite low 
up to about 0.4 times the sample rate, and quite high 
above that. In a presampling filter for studio video, 
attenuation must be less than 1 dB or so up to about 
5.5 MHz, and better than 40 or 50 dB above 6.75 MHz. 
This is a demanding transition ratio Agu/gu s . Figure 1 7.4 
above (top) sketches the filter template of a conven- 
tional analog presampling filter. 



An oversampling A-to-D converter operates at 
a multiple of the ultimate sampling rate - say at 
27 MHz, twice the rate of Rec. 601 video. The 
converter is preceded by a cheap analog filter that 
severely attenuates components at 13.5 MHz and 
above. However, its characteristics between 5.5 MHz 
and 13.5 MHz are not critical. The demanding aspects 
of filtering in that region are left to a digital 2:1 down- 
sampler. The transition ratio Agu/gu s of the analog filter 
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In certain FIR filters whose corner is 
exactly 0.25 / s , half the coefficients 
are zero. This leads to a considerable 
reduction in complexity. 



In the common case of interpola- 
tion horizontally across an image 
row, the argument x is horizontal 
position. Interpolating along the 
time axis, as in digital audio sample 
rate conversion, you could use the 
symbol t to represent time. 



In computer graphics, the linear 
interpolation operation is often 
called LIRP, (pronounced lerp). 



is greatly relaxed compared to direct conversion. In 
today's technology, the cost of the digital downsampler 
is less than the difference in cost between excellent and 
mediocre analog filtering. Complexity is moved from 
the analog domain to the digital domain; total system 
cost is reduced. Figure 17.5 (on page 175) sketches the 
template of an analog presampling filter appropriate for 
use preceding a 2x oversampled A-to-D converter. 

Figure 16.25, on page 166, showed the response of 
a 55-tap filter having a corner frequency of 0.25 f s . This 
is a halfband filter, intended for use following 
a 2x-oversampled A-to-D converter. 

The approach to two-times oversampled D-to-A 
conversion is comparable. The D-to-A device operates 
at 27 MHz; it is presented with a datastream that has 
been upsampled by a 1:2 ratio. For each input sample, 
the 2x-oversampling filter computes 2 output samples. 
One is computed at the effective location of the input 
sample, and the other is computed at an effective loca- 
tion halfway between input samples. The filter attenu- 
ates power between 6.75 M Hz and 13.5 MFIz. the 
analog postsampling filter need only reject components 
at and above 13.5 MFIz. As in the two-times oversam- 
pling A-to-D conversion, its performance between 
6.75 MHz and 13.5 MHz isn't critical. 

Interpolation 

In mathematics, interpolation is the process of 
computing the value of a function or a putative func- 
tion (call it g ), for an arbitrary argument (x), given 
several function argument and value pairs [x,-, s,]. 

There are many methods for interpolating, and many 
methods for constructing functions that interpolate. 

Given two sample pairs [x 0 , s 0 ] and [x 1r s^, the linear 
interpolation function has this form: 

g ( x ) = s ° + ^^( Sl " s o) Eq1A1 

I symbolize the interpolating function as g ; the symbol f 
is already taken to represent frequency. I write g with a 
tilde (g) to emphasize that it is an approximation. 
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The linear interpolation function can be rewritten as 
a weighted sum of the neighboring samples s 0 and s-p 

g(x) = c 0 (x)-s 0 +c 0 (x)-s 1 Eq 17.2 

The weights depend upon the x (or t) coordinate: 



co(x) 



X-| — X 

x 1 - x 0 ' 




X-Xp 

Xi-x 0 



Eq 17.3 



Julius O. Smith calls this Waring- 
Lagrange interpolation, since Waring 
published it 16 years before 
Lagrange. See Smith's Digital Audio 
Resampling Home Page, <www- 
ccrma.stanford.edu/~jos/resample>. 



Lagrange interpolation 

J.L. Lagrange (1736-1813) developed a method of 
interpolation using polynomials. A cubic interpolation 
function is a polynomial of this form: 

g(x) = ax 3 + bx 2 + cx + d Eq 17.4 

Interpolation involves choosing appropriate coeffi- 
cients a, b, c, and d, based upon the given argu- 
ment/value pairs [xj, Sj ]. Lagrange described a simple 
and elegant way of computing the coefficients. 

Linear interpolation is just a special case of Lagrange 
interpolation of the first degree. (Directly using the 
value of the nearest neighbor can be considered zero- 
order interpolation.) There is a second-degree 
(quadratic) form; it is rarely used in signal processing. 



In mathematics, to interpolate refers to the process that 
I have described. However, the same word is used to 
denote the property whereby an interpolating function 
produces values exactly equal to the original sample 
values (s ( ) at the original sample coordinates (x,). The 
Lagrange functions exhibit this property. You might 
guess that this property is a requirement of any interpo- 
lating function. However, in signal processing this is not 
a requirement - in fact, the interpolation functions used 
in video and audio rarely pass exactly through the orig- 
inal sample values. As a consequence of using the 
terminology of mathematics, in video we have the 
seemingly paradoxical situation that interpolation func- 
tions usually do not "interpolate"! 



In principle, cubic interpolation could be undertaken 
for any argument x, even values outside the x-coordi- 
nate range of the four input samples. (Evaluation 



CHAPTER 17 



RESAMPLING, INTERPOLATION, AND DECIMATION 



177 



Figure 17.6 Cubic interpo- 
lation of a signal starts with 
equally spaced samples, in 
this example 47, 42, 43, 
and 46. The underlying 
function is estimated to 
be a cubic polynomial that 
passes through ("interpo- 
lates") all four samples. 

The polynomial is evalu- 
ated between the two 
central samples, as shown 
by the black segment. Here, 
evaluation is at phase offset 
q>. If the underlying func- 
tion isn't a polynomial, 
small errors are produced. 



Eq 17.5 
x-x 0 

(p = — ■ Xq < m < x 

*1 - *0 ' 




Sample coordinate 



outside the interval [x_-| , x 2 ] would be called extrapola- 
tion.) In digital video and audio, we limit x to the range 
between x 0 and x-] , so as to estimate the signal in the 
interval between the central two samples. To evaluate 
outside this interval, we substitute the input sample 
values [s_-] , s 0 , s-] , s 2 ] appropriately - for example, to 
evaluate between s- 1 and s 2 , we shift the input sample 
values left one place. 

With uniform sampling (as in conventional digital video), 
when interpolating between the two central samples 
the argument x can be recast as the phase offset , or the 
fractional phase ( cp , phi), at which a new sample is requ- 
ired between two central samples. (See Equation 17.5.) 
In abstract terms, <p lies between 0 and 1; in hardware, 
it is implemented as a binary or a rational fraction. In 
video, a 1-D interpolator is usually an FIR filter whose 
coefficients are functions of the phase offset. The 
weighting coefficients (c,) are functions of the phase 
offset; they can be considered as basis functions. 

In signal processing, cubic (third-degree) interpolation 
is often used; the situation is sketched in Figure 17.6 
above. In linear interpolation, one neighbor to the left 
and one to the right are needed. In cubic interpolation, 
we ordinarily interpolate in the central interval, using 
two original samples to the left and two to the right of 
the desired sample instant. 
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Equation 17.2 can be reformulated: 

g(<p) = c-i (<p) ■ s-i + c 0 (<p) ■ so + c i(<p) ' si + c 2 (<p) • s 2 Eq 1 7.6 

The function takes four sample values [s_-| , s 0 , s 1 , s 2 ] 
surrounding the interval of interest, and the phase 
offset cp between 0 and 1. The coefficients (c ( ) are now 
functions of the argument cp ; The interpolator forms 
a weighted sum of four sample values, where the 
weights are functions of the parameter cp ; it returns an 
estimated value. (If the input samples are values of 
a polynomial not exceeding the third degree, then the 
values produced by a cubic Lagrange interpolator are 
exact, within roundoff error: Lagrange interpolation 
"interpolates"!) 



Smith, A.R., “Planar 2-pass texture 
mapping and warping,” in Computer 
Graphics 21 (4): 12-19 (Jul. 1987, 
Proc. SIGGRAPH 87), 263-272. 



If a 2-D image array is to be resampled at arbitrary x 
andy coordinate values, one approach is to apply a 1-D 
filter along one axis, then apply a 1-D filter along the 
other axis. This approach treats interpolation as 
a separable process, akin to the separable filtering that 

1 will introduce on page 191. Surprisingly, this two- pass 
approach can be used to rotate an image; see Smith, 
cited in the margin. Alternatively, a 2x2 array (of 

4 sample values) can be used for linear interpolation in 

2 dimensions in one step - this is bilinear interpolation. 
A more sophisticated approach is to use a 4x4 array (of 
16 sample values) as the basis for cubic interpolation in 
2 dimensions - this is bicubic interpolation. (It is mathe- 
matically comparable to 15th-degree interpolation in 
one dimension.) 



Bartels, Richard H., John C. 
Beatty, and Brian A. Barsky, An 
Introduction to Splines for Use in 
Computer Graphics and Geometric 
Modeling (San Francisco: Morgan 
Kaufmann, 1989). 



Curves can be drawn in 2-space using a parameter u as 
the argument to each of two functions x(u) and y(u) 
that produce a 2-D coordinate pair for each value of u. 
Cubic polynomials can be used as x(u) and y (u). This 
approach can be extended to three-space by adding 
a third function, z (u). Pierre Bezier developed 
a method, which is now widely used, to use cubic poly- 
nomials to describe curves and surfaces. Such curves are 
now known as Bezier curves or Bezier splines. The 
method is very important in the field of computer 
graphics; however, Bezier splines and their relatives are 
infrequently used in signal processing. 
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Only symmetric FIR filters exhibit 
true linear phase. Other FIR 
filters exhibit very nearly linear 
phase, close enough to be 
considered to have linear phase 
in video and audio. 



Lagrange interpolation as filtering 

Except for having 4 taps instead of 5, Equation 17.6 has 
identical form to the 5-tap Gaussian filter of 
Equation 16.2, on page 158! Lagrange interpolation 
can be viewed as a special case of FIR filtering, and can 
be analyzed as a filtering operation. In the previous 
chapter, Filtering and sampling, all of the examples were 
symmetric. Interpolation to produce samples exactly 
halfway between input samples, such as in a two-times 
oversampling DAC, is also symmetric. However, most 
interpolators are asymmetric. 



There are four reasons why polynomial interpolation is 
generally unsuitable for video signals: Polynomial inter- 
polation has unequal stopband ripple; nulls lie affixed 
positions in the stopband; the interpolating function 
exhibits extreme behavior outside the central interval; 
and signals presented to the interpolator are somewhat 
noisy. I will address each of these issues in turn. 

• Any Lagrange interpolator has a frequency response 
with unequal stopband ripple, sometimes highly 
unequal. That is generally undesirable in signal 
processing, and it is certainly undesirable in video. 



• A Lagrange interpolator "interpolates" the original 
samples; this causes a magnitude frequency response 
that has periodic nulls ("zeros") whose frequencies are 
fixed by the order of the interpolator. In order for 
a filter designer to control stopband attenuation, he or 
she needs the freedom to place nulls judiciously. This 
freedom is not available in the design of a Lagrange 
interpolator. 



• Conceptually, interpolation attempts to model, with 
a relatively simple function, the unknown function that 
generated the samples. The form of the function that 
we use should reflect the process that underlies genera- 
tion of the signal. A cubic polynomial may deliver 
sensible interpolated values between the two central 
points. However, the value of any polynomial rapidly 
shoots off to plus or minus infinity at arguments outside 
the region where it is constrained by the original 
sample values. That property is at odds with the 
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You can consider the entire stop- 
band of an ideal sine filter to contain 
an infinity of nulls. Mathematically, 
the sine function represents the limit 
of Lagrange interpolation as the 
order of the polynomial approaches 
infinity. See Appendix A of Smith's 
Digital Audio Resampling Home Page, 
cited in the margin of page 177. 



The 720p60 and 1080/30 stan- 
dards have an identical sampling 
rate (74.25 MHz). In the logic 
design of this example, there is 
a single clock domain. 



behavior of signals, which are constrained to lie within 
a limited range of values forever (say the abstract range 
0 to 1 in video, or ±0.5 in audio). 

In signal processing, there is always some uncertainty in 
the sample values caused by noise accompanying the 
signal, quantization noise, and noise due to roundoff 
error in the calculations in the digital domain. When 
the source data is imperfect, it seems unreasonable to 
demand perfection of an interpolation function. 

These four issues are addressed in signal processing by 
using interpolation functions that are not polynomials 
and that do not come from classical mathematics. 
Instead, we usually use interpolation functions based 
upon the the sine weighting function that I introduced 
on page 148. In signal processing, we usually design 
interpolators that do not "interpolate" the original 
sample values. 

The ideal sine weighting function has no distinct nulls in 
its frequency spectrum. When sine is truncated and 
optimized to obtain a physically realizable filter, the 
stopband has a finite number of nulls. Unlike 
a Lagrange interpolator, these nulls do not have to be 
regularly spaced. It is the filter designer's ability to 
choose the frequencies for the zeros that allows him or 
her to tailor the filter's response. 

Polyphase interpolators 

Some video signal processing applications require 
upsampling at simple ratios. For example, conversion 
from 1280 S AL to 1920 S AL in an HDTV format 
converter requires 2:3 upsampling. An output sample is 
computed at one of three phases: either at the site of 
an input sample, or V 3 or 2 /s of the way between input 
samples. The upsampler can be implemented as an FIR 
filter with just three sets of coefficients; the coefficients 
can be accessed from a lookup table addressed by cp. 

Many interpolators involve ratios more complex than 
the 2:3 ratio of this example. For example, in conver- 
sion from 4 f sc NTSC to Rec. 601 (4:2:2), 910 input 
samples must be converted to 858 results. This involves 



CHAPTER 17 



RESAMPLING, INTERPOLATION, AND DECIMATION 



181 



a downsampling ratio of 35:33. Successive output 
samples are computed at an increment of 1 2 /j 3 input 
samples. Every 33rd output sample is computed at the 
site of an input sample ( 0 ); other output samples are 
computed at input sample coordinates 1 2 / 33 , 2 % 3 , ..., 
16 32 / 3 3 , I8V33, 19 3 /33, ■■■, 34 31 / 3 3 ■ Addressing 
circuitry needs to increment a sample counter by one, 
and a fractional numerator by 2 modulo 33 (yielding 
the fraction 2 / 33 ), at each output sample. Overflow from 
the fraction counter carries into the sample counter; 
this accounts for the missing input sample number 17 
in the sample number sequence of this example. The 
required interpolation phases are at fractions cp = 0 , 

V33, 2 /33, 3 /33, ■■■, 32 /33 between input samples. 



In the logic design of this example, 
two clock domains are involved. 



A straightforward approach to design of this interpo- 
lator in hardware is to drive an FIR filter at the input 
sample rate. At each input clock, the input sample 
values shift across the registers. Addressing circuitry 
implements a modulo-33 counter to keep track of 
phase - a phase accumulator. At each clock, one of 
33 different sets of coefficients is applied to the filter. 
Each coefficient set is designed to introduce the appro- 
priate phase shift. In this example, only 33 result 
samples are required every 35 input clocks: During 
2 clocks of every 35, no result is produced. 



This structure is called a polyphase filter. This example 
involves 33 phases; however, the number of taps 
required is independent of the number of phases. A 
2x-oversampled prefilter, such I described on page 174, 
has just two phases. The halfband filter whose response 
is graphed in Figure 16.25, on page 166, would be suit- 
able for this application; that filter has 55 taps. 

Polyphase taps and phases 

The number of taps required in a filter is determined by 
the degree of control that the designer needs to exer- 
cise over frequency response, and by how tightly the 
filters in each phase need to match each other. In many 
cases of consumer-grade video, cubic (4-tap) interpola- 
tion is sufficient. In studio video, eight taps or more 
might be necessary, depending upon the performance 
to be achieved. 
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In a direct implementation of a polyphase FIR interpo- 
lator, the number of phases is determined by the arith- 
metic that relates the sampling rates. The number of 
phases determines the number of coefficient sets that 
need to be used. Coefficient sets are typically precom- 
puted and stored in nonvolatile memory. 

On page 181, 1 described a polyphase resampler having 
33 phases. In some applications, the number of phases 
is impractically large to implement directly. This is the 
case for the 709379:540000 ratio required to convert 
from 4/ sc PAL to Rec. 601 (4:2:2), from about 
922 active samples per line to about 702. In other 
applications, such as digital video effects, the number 
of phases is variable, and unknown in advance. Applica- 
tions such as these can be addressed by an interpolator 
having a number of phases that is a suitable power of 
two, such as 256 phases. Phase offsets are computed to 
the appropriate degree of precision, but are then 
approximated to a binary fraction (in this case having 
8 bits) to form the phase offset cp that is presented to 
the interpolator. 

If the interpolator implements 8 fractional bits of phase, 
then any computed output sample may exhibit a posi- 
tional error of up to ±Vsi2 of a sample interval. This is 
quite acceptable for component digital video. However, 
if the phase accumulator implements just 8 fractional 
bits, that positional error will accumulate as the incre- 
mental computation proceeds across the image row. In 
this example, with 922 active samples per line, the 
error could reach 3 or 4 sample intervals at the right- 
hand end of the line ! This isn't tolerable. The solution is 
to choose a sufficient number of fractional bits in the 
phase accumulator to keep the cumulative error within 
limits. In this example, 13 bits are sufficient, but only 
8 of those bits are presented to the interpolator. 

Implementing polyphase interpolators 

Polyphase interpolation is a specialization of FIR 
filtering; however, there are three major implementa- 
tion differences. First, in atypical FIR filter, the input 
and output rates are the same; in a polyphase interpo- 
lator, the input and output rates are usually different. 
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Second, FIR filters usually have fixed coefficients; in 
a polyphase FIR interpolator, the coefficients vary on 
a sample-by-sample basis. Third, typical FIR filters are 
symmetrical, but polyphase interpolators are not. 

Generally speaking, for a small number of phases - 
perhaps 8 or fewer - the cost of an interpolator is 
dominated by the number of multiplication operations, 
which is proportional to the number of taps. Beyond 
about 8 taps, the cost of coefficient storage begins to 
be significant. The cost of the addressing circuitry 
depends only upon the number of phases. 

In the 35:33 downsampler example, I discussed 
a hardware structure driven by the input sample rate. 
Suppose the hardware design requires that the interpo- 
lator be driven by the output clock. For 31 of each 33 
output clocks, one input sample is consumed; however, 
for 2 clocks, two input samples are consumed. This 
places a constraint on memory system design: Either 
two paths from memory must be implemented, or the 
extra 44 samples per line must be accessed during the 
blanking interval, and be stored in a small buffer. It is 
easier to drive this interpolator from the input clock. 

Consider a 33:35 upsampler, from Rec. 601 to 4/ sc 
NTSC. If driven from the output side, the interpolator 
produces one output sample per clock, and consumes 
at most one input sample per clock. (For 2 of the 
35 output clocks, no input samples are consumed.) If 
driven from the input side, for 2 of the 33 input clocks, 
the interpolator must produce two output samples. This 
is likely to present problems to the design of the FIR 
filter and the output side memory system. 

The lesson is this: The structure of a polyphase interpo- 
lator is simplified if it is driven from the high-rate side. 

Decimation 

In Lagrange interpolation, no account is taken of 
whether interpolation computes more or fewer output 
samples than input samples. Flowever, in signal 
processing, there is a big difference between downsam- 
pling - where lowpass filtering is necessary to prevent 
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aliasing - and upsampling, where lowpass filtering is 
necessary to suppress "imaging." In signal processing, 
the term interpolation generally implies upsampling, 
that is, resampling to any ratio of unity or greater. (The 
term interpolation also describes phase shift without 
sample rate change; think of this as the special case of 
upsampling with a ratio of 1 :1.) 



Taken literally, decimation involves 
a ratio of 10:9, not 10:1. 



Downsampling with a ratio of 10:9 is analogous to the 
policy by which the Roman army dealt with treachery 
and mutiny among its soldiers: One in ten of the 
offending soldiers was put to death. Their term decima- 
tion has come to describe downsampling in general. 

Lowpass filtering in decimation 

Earlier in this chapter, I expressed chroma subsampling 
as 2:1 decimation. In a decimator, samples are lowpass 
filtered to attenuate components at and above half the 
new sampling rate; then samples are dropped. Obvi- 
ously, samples that are about to be dropped need not 
be computed ! Ordinarily, the sample-dropping and 
filtering are incorporated into the same circuit. 



For details of interpolators and 
decimators, see Crochiere, Ronald 
E., and Lawrence R. Rabiner, 
Multirate Digital Signal Processing 
(New York: Prentice-Hall, 1983). 



In the example of halfband decimation for chroma 
subsampling, I explained the necessity of lowpass 
filtering to 0.25 f s . In the 4 f sc NTSC to Rec. 601 
example that I presented in Polyphase interpolators, on 
page 181 , the input and output sample rates were so 
similar that no special attention needed to be paid to 
bandlimiting at the result sample rate. If downsampling 
ratio is much greater than unity - say 5:4, or greater - 
then the impulse response must incorporate a lowpass 
filtering (prefiltering, or antialiasing) function as well as 
phase shift. To avoid aliasing, the lowpass corner 
frequency must scale with the downsampling ratio. This 
may necessitate several sets of filter coefficients having 
different corner frequencies. 
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Image digitization 
and reconstruction 
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Figure 18.1 Horizontal domain 




Figure 18.2 Vertical domain 




Figure 18.3 Temporal domain 




Figure 18.4 Spatial domain 



In Chapter 16, Filtering and sampling, on page 141, 

I described how to analyze a signal that is a function of 
the single dimension of time, such as an audio signal. 
Sampling theory also applies to a signal that is 
a function of one dimension of space, such as a single 
scan line (image row) of a video signal. This is the hori- 
zontal or transverse domain, sketched in Figure 18.1 in 
the margin. If an image is scanned line by line, the 
waveform of each line can be treated as an indepen- 
dent signal. The techniques of filtering and sampling in 
one dimension, discussed in the previous chapter, apply 
directly to this case. 

Consider a set of points arranged vertically that origi- 
nate at the same displacement along each of several 
successive image rows, as sketched in Figure 18.2. 
Those points can be considered to be sampled by the 
scanning process itself. Sampling theory can be used to 
understand the properties of these samples. 

A third dimension is introduced when a succession of 
images is temporally sampled to represent motion. 
Figure 18.3 depicts samples in the same column and 
the same row in three successive frames. 

Complex filters can act on two axes simultaneously. 
Figure 18.4 illustrates spatial sampling. The properties 
of the entire set of samples are considered all at once, 
and cannot necessarily be separated into independent 
horizontal and vertical aspects. 
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Figure 18.5 Horizontal 
spatial frequency domain 
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Spatial frequency domain 

I explained in Image structure, on page 43, how a one- 
dimensional waveform in time transforms to a one- 
dimensional frequency spectrum. This concept can be 
extended to two dimensions: The two dimensions of 
space can be transformed into two-dimensional spatial 
frequency. The content of an image can be expressed as 
horizontal and vertical spatial frequency components. 
Spatial frequency is plotted using cycles per picture 
width (C/PW) as an x-coordinate, and cycles per picture 
height (C/PH) as ay-coordinate. You can gain insight 
into the operation of an imaging system by exploring its 
spatial frequency response. 



In the image at the top left of Figure 18.5 above, every 
image row has identical content: 4 cycles of a sine 
wave. Underneath the image, I sketch the time domain 
waveform of every line. Since every line is identical, no 
power is present in the vertical direction. Considered in 
the spatial domain, this image contains power at 
a single horizontal spatial frequency, 4 C/PW; there is 
no power at any vertical spatial frequency. All of the 
power of this image lies at spatial frequency [4, 0]. 

Figure 18.6 opposite shows an image comprising 
a sinewave signal in the vertical direction. The height of 
the picture contains 3 cycles. The spatial frequency 
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Figure 18.6 Vertical spatial 
frequency domain 
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When spatial frequency is deter- 
mined analytically using the two- 
dimensional Fourier transform, the 
result is plotted in the manner of 
Figure 18.7, where low vertical 
frequencies - that is, low y values - 
are at the bottom. When spatial 
frequency is computed numerically 
using discrete transforms, such as 
the 2-D discrete Fourier transform 
(DFT), the fast Fourier transform 
(FFT), or the discrete cosine trans- 
form (DCT), the result is usually 
presented in a matrix, where low 
vertical frequencies are at the top. 



graph, to the right, shows that all of the power of the 
image is contained at coordinates [0, 3] of spatial 
frequency. In an image where each image row takes 
a constant value, all of the power is located on the 
y-axis of spatial frequency. 

If an image comprises rows with identical content, all of 
the power will be concentrated on the horizontal axis 
of spatial frequency. If the content of successive scans 
lines varies slightly, the power will spread to nonzero 
vertical frequencies. An image of diagonal bars would 
occupy a single point in spatial frequency, displaced 
from the x-axis and displaced from the y-axis. 

The spatial frequency that corresponds to half the 
vertical sampling rate depends on the number of 
picture lines. A 480/ system has approximately 480 
picture lines: 480 samples occupy the height of the 
picture, and the Nyquist frequency for vertical sampling 
is 240 C/PH. No vertical frequency in excess of this can 
be represented without aliasing. 

In most images, successive rows and columns of 
samples (of R', C, B', or of luma) are very similar; low 
frequencies predominate, and image power tends to 
cluster toward spatial frequency coordinates [0, 0]. 
Figure 18.7 overleaf sketches the spatial frequency 
spectrum of luma in a 480/ system. If the unmodulated 
NTSC color subcarrier were an image data signal, it 
would take the indicated location. In composite NTSC, 
chroma is modulated onto the subcarrier; the resulting 
modulated chroma can be thought of as occupying a 
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Figure 18.7 Spatial 
frequency spectrum of 
480/ luma is depicted in 
this plot, which resem- 
bles a topographical map. 
The position that unmod- 
ulated NTSC subcarrier 
would take if it were an 
image data signal is 
shown; see page 357. 



Optical transfer function (OTF) 
includes phase. MTF is the 
magnitude of the OTF - it 
disregards phase. 




Figure 18.8 Two samples, 
vertically arranged 
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particular region of the spatial frequency plane, as I will 
describe in Spatial frequency spectrum of composite 
NTSC, on page 359. In NTSC encoding, modulated 
chroma is then summed with luma; this causes the 
spectra to be overlaid. If the luma and chroma spectra 
overlap, cross-color and cross-luma interference arti- 
facts can result. 

In optics, the terms magnitude frequency response and 
bandwidth are not used. An optical component, 
subsystem, or system is characterized by modulation 
transfer function (MTF), a one-dimensional plot of hori- 
zontal or vertical spatial frequency response. ( Depth of 
modulation is a single point quoted from this graph.) 
Technically, MTF is the Fourier transform of the point 
spread function (PSF) or line spread function (LSF). By 
definition, MTF relates to intensity. Since negative light 
power is physically unrealizable, MTF is measured by 
superimposing a high-frequency sinusoidal (modu- 
lating) wave onto a constant level, then taking the ratio 
of output modulation to input modulation. 

Comb filtering 

In Finite impulse response (FIR) filters, on page 157, 

I described FIR filters operating in the single dimension 
of time. If the samples are from a scan line of an image, 
the frequency response can be considered to represent 
horizontal spatial frequency (in units of C/PW), instead 
of temporal frequency (in cycles per second, or hertz). 

Consider a sample from a digital image sequence, and 
the sample immediately below, as sketched in 
Figure 18.8 in the margin. If the image has 640 active 
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Figure 18.9 Response of 
[1, 1] FIR filter operating 
in the vertical domain, 
scaled for unity gain, is 
shown. This is a two-line 
(1H) comb filter. Magni- 
tude falls as cos ui. 
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Figure 18.10 Separable 
spatial filter examples 
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Figure 18.11 Inseparable 
spatial filter examples 



(picture) samples per line, and these two samples are 
presented to a comb filter like that of Figure 16.19, on 
page 156, but having 639 zero-samples between the 
two "ones" then the action of the comb filter will be 
identical to the action of a filter having two taps 
weighted [1, 1] operating in the vertical direction. In 
Figure 16.12, on page 153, I graphed the frequency 
response of a one-dimensional [1, 1] filter. The graph in 
Figure 18.9 above shows the response of the comb 
filter, expressed in terms of its response in the vertical 
direction. Flere magnitude response is shown normal- 
ized for unity gain at DC; the filter has a response of 
about 0.707 (i.e., it is 3 db down) at one-quarter the 
vertical sampling frequency. 

Spatial filtering 

Placing a [1, 1] horizontal lowpass filter in tandem with 
a [1, 1] vertical lowpass filter is equivalent to computing 
a weighted sum of spatial samples using the weights 
indicated in the matrix on the left in Figure 18.10. 
Placing a [1, 2, 1] horizontal lowpass filter in tandem 
with a [1, 2, 1] vertical lowpass filter is equivalent to 
computing a weighted sum of spatial samples using the 
weights indicated in the matrix on the right in 
Figure 18.10. These are examples of spatial filters. These 
particular spatial filters are separable: They can be 
implemented using horizontal and vertical filters in 
tandem. Many spatial filters are inseparable: Their 
computation must take place directly in the two-dimen- 
sional spatial domain; they cannot be implemented 
using cascaded one-dimensional horizontal and vertical 
filters. Examples of inseparable filters are given in the 
matrices in Figure 18.11. 
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Schreiber, William F., and Donald E. 
Troxel, “Transformations Between 
Continuous and Discrete Represen- 
tations of Images: A Perceptual 
Approach," in IEEE Tr. on Pattern 
Analysis and Machine Intelligence, 
PAM I -7 (2): 178-186 (Mar. 1985). 
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Image presampling filters 

In a video camera, continuous information must be 
subjected to a presampling ("antialiasing") filter. 

Aliasing is minimized by optical spatial lowpass filtering 
that is effected in the optical path, prior to conversion 
of the image signal to electronic form. MTF limitations 
in the lens impose some degree of filtering. An addi- 
tional filter can be implemented as a discrete optical 
element (often employing the optical property of bire- 
fringence). Additionally, or alternatively, some degree of 
filtering may be imposed by optical properties of the 
photosensor itself. 

In resampling, signal power is not constrained to 
remain positive; filters having negative weights can be 
used. The ILPF and other sine-based filters have nega- 
tive weights, but those filters often ring and exhibit 
poor visual performance. Schreiber and Troxel found 
well-designed sharpened Gaussian filters with a = 0.375 
to have superior performance to the ILFP. A filter that is 
optimized for a particular mathematical criterion does 
not necessarily produce the best-looking picture! 

Image reconstruction filters 

On page 43, I introduced "box filter" reconstruction. 
This is technically known as sample-and-hold, zero-order 
hold, or nearest-neighbor reconstruction. 

In theory, ideal image reconstruction would be 
obtained by using a PSF which has a two-dimensional 
sine distribution. This would be a two-dimensional 
version of the ideal lowpass filter (ILPF) that I described 
for one dimension on page 148. Plowever, a sine func- 
tion involves negative excursions. Light power cannot 
be negative, so a sine filter cannot be used for presam- 
pling at an image capture device, and cannot be used as 
a reconstruction filter at a display device. A box-shaped 
distribution of sensitivity across each element of 
a sensor is easily implemented, as is a box-shaped 
distribution of intensity across each pixel of a display. 
However, like the one-dimensional boxcar of 
Chapter 16, a box distribution has significant response 
at high frequencies. Used at a sensor, a box filter will 
permit aliasing. Used in a display, scan-line or pixel 
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A raised cosine distribution is roughly 
similar to a Gaussian. See page 542. 

Schreiber and Troxel suggest recon- 
struction with a sharpened Gaussian 
having a = 0.3. See their paper cited 
in the marginal note on page 192. 



structure is likely to be visible. If an external optical 
element such as a lens attenuates high spatial frequen- 
cies, then a box distribution might be suitable. A simple 
and practical choice for either capture or reconstruc- 
tion is a Gaussian having a judiciously chosen half- 
power width. A Gaussian is a compromise that can 
achieve reasonably high resolution while minimizing 
aliasing and minimizing the visibility of the pixel (or 
scan-line) structure. 



Spatial (2-D) oversampling 

In image capture, as in reconstruction for image display, 
ideal theoretical performance would be obtained by 
using a PSF with a sine distribution. However, a sine 
function cannot be used directly in a transducer of light, 
because light power cannot be negative: Negative 
weights cannot be implemented. As in display recon- 
struction, a simple and practical choice for a direct pres- 
ampling or reconstruction filter is a Gaussian having 
a judiciously chosen half-power width. 



I have been describing direct sensors, where samples 
are taken directly from sensor elements, and direct 
displays, where samples directly energize display 
elements. In Oversampling, on page 174, I described 
a technique whereby a large number of directly 
acquired samples can be filtered to a lower sampling 
rate. That section discussed downsampling in one 
dimension, with the main goal of reducing the 
complexity of analog presampling or reconstruction 
filters. The oversampling technique can also be applied 
in two dimensions: A sensor can directly acquire a fairly 
large number of samples using a crude optical presam- 
pling filter, then use a sophisticated digital spatial filter 
to downsample. 



The advantage of interlace - reducing scan-line visi- 
bility for a given bandwidth, spatial resolution, and 
flicker rate - is built upon the assumption that the 
sensor (camera), data transmission, and display all use 
identical scanning. If oversampling is feasible, the situa- 
tion changes. Consider a receiver that accepts progres- 
sive image data (as in the top left of Figure 6.8, on 
page 59), but instead of displaying this data directly, it 
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Oversampling to double the number 
of lines displayed during a frame 
time is called line doubling. 



synthesizes data for a larger image array (as in the 
middle left of Figure 6.8). The synthetic data can be 
displayed with a spot size appropriate for the larger 
array, and all of the scan lines can be illuminated in 
each Vgo s instead of just half of them. This technique is 
spatial oversampling. For a given level of scan-line visi- 
bility, this technique enables closer viewing distance 
than would be possible for progressive display. 

If such oversampling had been technologically feasible 
in 1941, or in 1953, then the NTSC would have 
undoubtedly chosen a progressive transmission stan- 
dard. Flowever, oversampling is not economical even in 
today's SDTV studio systems, let alone FIDTV or 
consumer electronics. So interlace continues to have an 
economic advantage. Flowever, this advantage is 
eroding. It is likely that all future video system stan- 
dards will have progressive scanning. 

Oversampling provides a mechanism for a sensor PSF or 
a display PSF to have negative weights, yielding a 
spatially "sharpened" filter. For example, a sharpened 
Gaussian PSF can be obtained, and can achieve perfor- 
mance better than a Gaussian. With a sufficient degree 
of oversampling, using sophisticated filters having sinc- 
like PSFs, the interchange signal can come arbitrarily 
close to the Nyquist limit. Flowever, mathematical 
excellence does not necessarily translate to improved 
visual performance. Sharp filters are likely to ring, and 
thereby produce objectionable artifacts. 

If negative weights are permitted in a PSF, then nega- 
tive signal values can potentially result. Standard studio 
digital interfaces provide footroom so as to permit 
moderate negative values to be conveyed. Using nega- 
tive weights typically improves filter performance even 
if negative values are clipped after downsampling. 

Similarly, if a display has many elements for each digital 
sample, a sophisticated digital upsampler can use nega- 
tive weights. Negative values resulting from the filter's 
operation will be clipped for presentation to the display 
itself, but again, improved performance could result. 
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Boynton, Robert M., Human 
Color Vision (New York: Holt, 
Rinehart and Winston, 1979). 

Wandell, Brian A., Foundations 
of Vision (Sunderland, Mass.: 
Sinauer Associates, 1995). 



Properties of human vision are central to image system 
engineering. They determine how many bits are neces- 
sary to represent luminance (or tristimulus) levels, and 
how many pixels need to be provided per degree of 
picture angle. This chapter introduces the intensity 
discrimination and spatial properties of vision. 

Retina 

The human retina has four types of photoreceptor cells 
that respond to incident radiation with different spec- 
tral response curves. A retina has about 100 million rod 
cells, and about 5 million cone cells (of three types). 

Rods are effective only at extremely low light levels. 
Since there is only one type of rod cell, what is loosely 
called night vision cannot discern colors. 

The cone cells are sensitive to longwave, mediumwave, 
and shortwave light - roughly, light in the red, green, 
and blue portions of the spectrum. Because there are 
just three types of color photoreceptors, three numer- 
ical components are necessary and sufficient to describe 
a color: Color vision is inherently trichromatic. To 
arrange for three components to mimic color vision, 
suitable spectral sensitivity functions must be used; this 
topic will be discussed in The CIE system of colorimetry, 
on page 211 . 
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range of vision 
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Figure 19.2 Adaptation 



Adaptation 

Vision operates over a remarkable range of luminance 
levels - about eight orders of magnitude (decades) 
sketched in Figure 19.1. For about four decades at the 
low end of the range, the rods are active; vision at these 
light levels is called scotopic. For the top five or six 
decades, the cones are active; vision at these light levels 
is called photopic. 

Mesopic vision takes place in the range of luminance 
levels where there is some overlap between rods and 
cones. Considered from the bottom of the photopic 
region, this is called rod intrusion. It is a research topic 
whether the rods have significance to color image 
reproduction at usual luminance levels (such as in the 
cinema). Today, for engineering purposes, the effect of 
rod intrusion is discounted. 

Vision adapts throughout this luminance range, as 
sketched in Figure 19.2. From sunlight to moonlight, 
illuminance changes by a factor of about 200000; adap- 
tation causes the sensitivity of the visual system to 
increase by about a factor of 1000. About one decade 
of adaptation is effected by the eye's iris - that is, by 
changes in pupil diameter. (Pupil diameter varies from 
about 2 mm to 8 mm.) Adaptation involves 
a photochemical process involving the visual pigment 
substance contained in the rods and the cones; it also 
involves neural mechanisms in the visual pathway. 

Dark adaptation, to low luminance, is slow: Adaptation 
from a bright sunlit day to the darkness of a cinema can 
take a few minutes. Adaptation to higher luminance is 
rapid but can be painful, as you may have experienced 
when walking out of the cinema back into daylight. 

Adaptation is a low-level phenomenon within the visual 
system; it is mainly controlled by total retinal illumina- 
tion. Your adaptation state is closely related to the 
mean luminance in your field of view. In a dark viewing 
environment, such as a cinema, the image itself controls 
adaptation. 
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Diffuse white was described on 
page 83. This wide range of 
luminance levels is sometimes 
called dynamic range, but 
nothing is in motion! 



Simultaneous contrast ratio is 
sometimes shortened to simulta- 
neous contrast, which unfortu- 
nately has a second (unrelated) 
meaning. See Surround effect, on 
page 82. Contrast ratio without 
qualification should be taken as 
simultaneous. 
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At a particular state of adaptation, vision can discern 
different luminances across about a 1000:1 range. 

When viewing a real scene, adaptation changes 
depending upon where in the scene your gaze is 
directed. 

For image reproduction purposes, vision can distin- 
guish different luminances down to about 1% of diffuse 
white; in other words, our ability to distinguish lumi- 
nance differences extends over a ratio of luminance of 
about 100:1. Loosely speaking, luminance levels less 
than 1% of peak white appear just "black": Different 
luminances below that level can be measured, but they 
cannot be visually distinguished. 

Contrast ratio 

Contrast ratio is the ratio of luminances of the lightest 
and darkest elements of a scene, or an image. In print 
and photography, the term need not be qualified. 
However, image content in motion picture film and 
video changes with time. Simultaneous contrast ratio (or 
on-off contrast ratio) refers to contrast ratio at one 
instant. Sequential contrast ratio measures light and dark 
elements that are separated in time - that is, not part of 
the same picture. Sequential contrast ratio in film can 
reach 10000:1. Such a high ratio may be useful to 
achieve an artistic effect, but performance of a display 
system is best characterized by simultaneous contrast 
ratio. 

In practical imaging systems, many factors conspire to 
increase the luminance of black, thereby lessening the 
contrast ratio and impairing picture quality. On an elec- 
tronic display or in a projected image, simultaneous 
contrast ratio is typically less than 100:1 owing to spill 
light (stray light) in the ambient environment or flare in 
the display system. Typical simultaneous contrast ratios 
are shown in Table 19.1 overleaf. Contrast ratio is 
a major determinant of subjective image quality, so 
much so that an image reproduced with a high simulta- 
neous contrast ratio may be judged sharper than 
another image that has higher measured spatial 
frequency content. 
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Viewing environment 


Max. luminance, 
cd-m -2 


Typical simul. 
contrast ratio 


L* range 


Cinema 


40 


80:1 


11. ..100 


Television (living room) 


100 


20:1 


27. ..100 


Office 


200 


5:1 


52. ..100 



Table 19.1 Typical simultaneous contrast ratios in image display are summarized. 



During the course of the day we experience a wide 
range of illumination levels; adaptation adjusts accord- 
ingly. But in video and film, we are nearly always 
concerned with viewing at a known adaptation state, so 
a simultaneous contrast ratio of 100:1 is adequate. 

Contrast sensitivity 

Within the two-decade range of luminance that is 
useful for image reproduction, vision has a certain 
threshold of discrimination. It is convenient to express 
the discrimination capability in terms of contrast sensi- 
tivity, which is the ratio of luminances between two 
adjacent patches of similar luminance. 



Yq. Adaptation (surround) 
luminance 
Y: Test luminance 
AY: Increment in test luminance 



Figure 19.3 below shows the pattern presented to an 
observer in an experiment to determine the contrast 
sensitivity of human vision. Most of the observer's field 
of vision is filled by a surround luminance level, Y 0 , 
which fixes the observer's state of adaptation. In the 
central area of the field of vision are placed two 
adjacent patches having slightly different luminance 



Figure 19.3 Contrast sensi- 
tivity test pattern is 

presented to an observer in 
an experiment to deter- 
mine the contrast sensi- 
tivity of human vision. The 
experimenter adjusts AT; 
the observer reports 
whether he or she detects 
a difference in lightness 
between the two patches. 
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Figure 19.4 Contrast sensitivity. This graph is redrawn, with permission, from Figure 3.4 of 
Schreiber's Fundamentals of Electronic Imaging Systems. Over a range of intensities of about 300:1, 
the discrimination threshold of vision is approximately a constant ratio of luminance. The flat 
portion of the curve shows that the perceptual response to luminance - termed lightness - is 
approximately logarithmic. 



Schreiber, William F., Funda- 
mentals of Electronic Imaging 
Systems, Third Edition (Berlin: 
Springer-Verlag, 1993). 



levels, Y and Y+ AT. The experimenter presents stimuli 
having a wide range of test values with respect to the 
surround, that is, a wide range of Y/Y 0 values. At each 
test luminance, the experimenter presents to the 
observer a range of luminance increments with respect 
to the test stimulus, that is, a range of A Y /Y values. 



Ig 100 

Ig 1 -01 



463; 



463 

1.01 =100 



NTSC documents from the early 
1950s used a contrast sensitivity 
of 2% and a contrast ratio of 30:1 
to derive 1 72 steps: 



Ig 30 
Ig 1 .02 



= 172 



See Fink, Donald G., ed., Color 
Television Standards (New York: 
McGraw-Hill, 1955), p. 201. 



When this experiment is conducted, the relationship 
graphed in Figure 19.4 above is found: Plotting 
log(AWY) as a function of log / reveals an interval of 
more than two decades of luminance over which the 
discrimination capability of vision is about 1% of the 
test luminance level. This leads to the conclusion that- 
for threshold discrimination of two adjacent patches of 
nearly identical luminance - the discrimination capa- 
bility is very nearly logarithmic. 

The contrast sensitivity function begins to answer this 
question: What is the minimum number of discrete 
codes required to represent relative luminance over 
a particular range? In other words, what luminance 
codes can be thrown away without the observer 
noticing? On a linear luminance scale, to cover a 100:1 
range with an increment of 0.01 takes 10000 codes, or 
about 14 bits. If codes are spaced according to a ratio 
of 1.01, then only about 463 codes are required. This 
number of codes can be represented in 9 bits. (For 
video distribution, 8 bits suffice.) 
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ISO 5-1 , Photography - Density 
measurements - Terms, symbols, 
and notations. See also parts 2 
through 4. 



SMPTE 180M, File Format for Digital 
Moving-Picture Exchange (DPX). 



In transmissive film media, transmittance (r) is the frac- 
tion of light transmitted through the medium. Transmit- 
tance is usually measured in logarithmic units: Optical 
density - or just density - is the negative of the loga- 
rithm of transmittance. (Equivalently, optical density is 
the logarithm of incident power divided by transmitted 
power.) The term stems from the physical density of 
developed silver (or in color film, developed dye) in the 
film. In reflective media, reflectance (p) is similarly 
expressed in density units. In motion picture film, loga- 
rithms are used not only for measurement, but also for 
image coding (in the Kodak Cineon system, and the 
SMPTE DPX standard). 



When two stimuli differ by 
1 JND, 75% of guesses will be 
right and 25% will be wrong. 



Stevens, S.S., Psychophysics 
(New York: Wiley, 1975). 



The logarithmic relationship relates to contrast sensi- 
tivity at threshold: We are measuring the ability of the 
visual system to discriminate between two nearly iden- 
tical luminances. If you like, call this a just-noticeable 
difference (JND), defined where the difference between 
two stimuli is detected as often as it is undetected. 
Logarithmic coding rests on the assumption that the 
threshold function can be extended to large luminance 
ratios. Experiments have shown that this assumption 
does not hold very well. At a given state of adaptation, 
the discrimination capability of vision degrades at low 
luminances, below several percent of diffuse white. 
Over a wider range of luminance, strict adherence to 
logarithmic coding is not justified for perceptual 
reasons. Coding based upon a power law is found to be 
a better approximation to lightness response than 
a logarithmic function. In video, and in computing, 
power functions are used instead of logarithmic func- 
tions. Incidentally, other senses behave according to 
power functions, as shown in Table 19.2. 



Percept 


Physical quantity 


Power 


Loudness 


Sound pressure level 


0.67 


Saltiness 


Sodium chloride concentration 


1.4 


Smell 


Concentration of aromatic 


0.6 




molecules 





Table 19.2 Power functions in perception 
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Figure 19.5 Contrast 
sensitivity function (CSF) 

varies with retinal illumi- 
nance, here shown in units 
of troland (Td). The curve 
at 9 Td, which typifies tele- 
vision viewing, peaks at 
about 4 cycles per degree 
(CPD, or <v/°). Below that 
spatial frequency, the eye 
acts as a differentiator; 
above it, the eye acts as 
an integrator. 



0.1 - 



1 _ 
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van Nes, F. L. , and M.A. Bouman, 
“Spatial modulation transfer in the 
human eye," in J. Opt. Soc. Am. 

57: 419-423 (1967). 



Contrast sensitivity function (CSF) 

The contrast sensitivity of vision is about 1 % - that is, 
vision cannot distinguish two luminance levels if the 
ratio between them is less than about 1.01. That 
threshold applies to visual features of a certain angular 
extent, about Vs°, for which vision has maximum ability 
to detect luminance differences. However, the contrast 
sensitivity of vision degrades for elements having 
angular subtense smaller or larger than about Vs°. 



Barten, Peter G.J., Contrast Sensi- 
tivity of the Human Eye and its 
Effect on Image Quality (Knegsel, 
Netherlands: HV Press, 1999). 



In vision science, rather than characterizing vision by its 
response to an individual small feature, we place many 
small elements side by side. The spacing of these 
elements is measured in terms of spatial frequency, in 
units of cycles per degree. Each cycle comprises a dark 
element and a white element. At the limit, a cycle 
comprises two samples or two pixels; in the vertical 
dimension, the smallest cycle corresponds to two scan 
lines. 



Troland (Td) is a unit of retinal 
illuminance equal to object lumi- 
nance (in cd-m -2 ) times pupil- 
lary aperture area (in mm 2 ). 



Figure 19.5 above shows a graph of the dependence of 
contrast sensitivity (on they-axis, expressed in 
percentage) upon spatial frequency (on the x-axis, 
expressed in cycles per degree). The graph shows 
a family of curves, representing different adaptation 
levels, from very dark (0.0009 Td) to very bright 
(900 Td). The curve at 90 Td is representative of elec- 
tronic or projected displays. 
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For video engineering, three features of this graph are 
important: 

• First, the 90 Td curve has fallen to a contrast sensitivity 
of 100 at about 60 cycles per degree. Vision isn't 
capable of perceiving spatial frequencies greater than 
this; a display need not reproduce detail higher than 
this frequency. This limits the resolution (or bandwidth) 
that must be provided. 

• Second, the peak of the 90 Td curve has a contrast 
sensitivity of about 1 %; luminance differences less than 
this can be discarded. This limits the number of bits per 
pixel that must be provided. 

• Third, the curve falls off at spatial frequencies below 
about one cycle per degree. In a consumer display, 
luminance can diminish (within limits) toward the edges 
of the image without the viewer's noticing. 

Campbell, F.W., and V. G. Robson, 

“Application of Fourier analysis to the 
visibility of gratings," in J. Physiol. 

(London) 197: 551-566 (1968). 



In traditional video engineering, the spatial frequency 
and contrast sensitivity aspects of this graph are used 
independently. The JPEG and MPEG compression 
systems exploit the interdependence of these two 
aspects, as will be explained in JPEG and motion-JPEG 
(M-JPEG) compression, on page 447. 



202 



DIGITAL VIDEO AND HDTV ALGORITHMS AND INTERFACES 



Luminance and lightness 20 



In Color science for video, on 
page 233, I will describe how 
spectral power distributions 
(SPDs) in the range 400 nm to 
700 nm are related to colors. 



Nonlinear coding of luminance is essential to maximize 
the perceptual performance of an image coding system. 
This chapter introduces luminance and lightness, or 
what is loosely called brightness. 

Luminance, denoted Y, is what I call a linear-light quan- 
tity; it is directly proportional to physical intensity 
weighted by the spectral sensitivity of human vision. 
Luminance involves light having wavelengths in the 
range of about 400 nm to 700 nm; luminance can be 
computed as a properly weighted sum of linear-light 
red, green, and blue tristimulus components, according 
to the principles and standards of the CIE. 



Lightness, denoted L*, is defined by the CIE as a 
nonlinear transfer function of luminance that approxi- 
mates the perception of brightness. 



The term luminance is often care- 
lessly and incorrectly used to refer 
to luma. See Relative luminance, 
on page 206, and Appendix A, 
YUV and luminance considered 
harmful, on page 595. 



In video, we do not compute the linear-light luminance 
of color science; nor do we compute lightness. Instead, 
we compute an approximation of lightness, luma 
(denoted Y') as a weighted sum of nonlinear (gamma- 
corrected) R', G\ and B' components. Luma is only 
loosely related to true (CIE) luminance. In Constant 
luminance, on page 75, I explained why video systems 
approximate lightness instead of computing it directly. 

I will detail the nonlinear coding used in video in 
Gamma, on page 257. In Luma and color differences, on 
page 281, I will outline how luma is augmented with 
color information. 
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See Introduction to radiometry 
and photometry, on page 601 . 



Radiance, intensity 

Image science concerns optical power incident upon 
the image plane of a sensor device, and optical power 
emergent from the image plane of a display device. 

Radiometry concerns the measurement of radiant 
optical power in the electromagnetic spectrum from 
3X10 11 Hz to 3x10 16 Hz, corresponding to wave- 
lengths from 1 mm down to 10 nm. There are four 
fundamental quantities in radiometry: 

• Radiant optical power, flux, is expressed in units of 
watts (W). 

• Radiant flux per unit area is irradiance; its units are 
watts per meter squared (W-nr 2 ). 

• Radiant flux in a certain direction - that is, radiant flux 
per unit of solid angle - is radiant intensity; its units are 
watts per steradian (W-sr -1 ). 

• Flux in a certain direction, per unit area, is radiance; 
its units are watts per steradian per meter squared 
(W-sr- 1 -nr 2 ). 



Radiance is measured with an instrument called a radi- 
ometer. A spectroradiometer measures spectral 
radiance - that is, radiance per unit wavelength. A 
spectroradiometer measures incident light; a 
spectrophotometer incorporates a light source, and 
measures either spectral reflectance or spectral trans- 
mittance. 

Photometry is essentially radiometry as sensed by 
human vision: In photometry, radiometric measure- 
ments are weighted by the spectral response of human 
vision (to be described). This involves wavelengths (A) 
between 360 nm to 830 nm, or in practical terms, 

400 nm to 700 nm. Each of the four fundamental quan- 
tities of radiometry - flux, irradiance, radiant intensity, 
and radiance - has an analog in photometry. The photo- 
metric quantities are luminous flux, illuminance, lumi- 
nous intensity, and luminance. In video engineering, 
luminance is the most important of these. 
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Figure 20.1 Luminous efficiency functions. The solid line indicates the luminance response of the 
cone photoreceptors - that is, the CIE photopic response. A monochrome scanner or camera must 
have this spectral response in order to correctly reproduce lightness. The peak occurs at about 
555 nm, the wavelength of the brightest possible monochromatic 1 mW source. (The lightly 
shaded curve shows the scotopic response of the rod cells - loosely, the response of night vision. 
The increased relative luminance of blue wavelengths in scotopic vision is called th ePurkinje shift.) 



Luminance 

i presented a brief introduction to The Commission Internationale de L'Eclairage (CIE, or 
Lightness terminology on page ii. International Commission on Illumination) is the inter- 
national body responsible for standards in the area of 
color. The CIE defines brightness as the attribute of 
a visual sensation according to which an area appears to 
exhibit more or less light. Brightness is, by the CIE's defi- 
nition, a subjective quantity: It cannot be measured. 



Publication CIE 15.2, Colorimetry, 
Second Edition (Vienna, Austria: 
Commission Internationale de 
L'Eclairage, 1986); reprinted with 
corrections in 1996. 



Until 2000, T(A) had the symbol 
y, pronounced WYE-bar. The 
luminous efficiency function has 
also been denoted V(A), 
pronounced VEE- lambda. 



The CIE has defined an objective quantity that is related 
to brightness. Luminance is defined as radiance 
weighted by the spectral sensitivity function - the sensi- 
tivity to power at different wavelengths - that is charac- 
teristic of vision. The luminous efficiency of the CIE 
Standard Observer, denoted Y(A), is graphed as the 
black line of Figure 20.1 above. It is defined numeri- 
cally, is everywhere positive, and peaks at about 
555 nm. When a spectral power distribution (SPD) is 
integrated using this weighting function, the result is 
luminance, denoted Y. In continuous terms, luminance 
is an integral of spectral radiance across the spectrum. 

In discrete terms, it is a dot product. The magnitude of 
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SMPTE RP 71, Setting Chromaticity 
and Luminance of White for Color 
Television Monitors Using Shadow- 
Mask Picture Tubes. 


luminance is proportional to physical power; in that 
sense it is like intensity. However, its spectral composi- 
tion is intimately related to the lightness sensitivity of 
human vision. Luminance is expressed in units of 
cd-m~ 2 ("nits"). Relative luminance, which 1 will 
describe in a moment, is a pure number without units. 

The luminous efficiency function is also known as the 
Y(K) color-matching function (CM F). Luminance, Y, is 
one of three distinguished tristimulus values. The other 
two distinguished tristimulus values, X and Z, and 
various R, G, and B tristimulus values, will be intro- 
duced in Color science for video, on page 233. 

You might intuitively associate pure luminance with 
gray, but a spectral power distribution having the shape 
of Figure 20.1 would not appear neutral gray! In fact, an 
SPD of that shape would appear distinctly green. As 
1 will detail in The CIE system of colorimetry, on 
page 211, it is very important to distinguish analysis 
functions - called color-matching functions, or CMFs - 
from spectral power distributions. The luminous effi- 
ciency function takes the role of an analysis function, 
not an SPD. 

Relative luminance 

In image reproduction - including photography, cinema, 
video, and print- we rarely, if ever, reproduce the abso- 
lute luminance of the original scene. Instead, we repro- 
duce luminance approximately proportional to scene 
luminance, up to the maximum luminance available in 
the reproduction medium. We process or record an 
approximation to relative luminance. To use the unqual- 
ified term luminance would suggest that we are 
processing or recording absolute luminance. 

In image reproduction, luminance is usually normalized 
to 1 or 100 units relative to a specified or implied refer- 
ence white ; we assume that the viewer will adapt to 
white in his or her ambient environment. SMPTE has 
standardized studio video monitors to have a reference 
white luminance of 103 cd-m~ 2 , and a reference white 
chromaticity of CIE D 65 . (1 will introduce CIE D 65 on 
page 224.) 
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Luminance from red, green, and blue 

The luminous efficiency of vision peaks in the medium- 
wave (green) region of the spectrum: If three mono- 
chromatic sources appear red, green, and blue, and 
have the same radiant power in the visible spectrum, 
then the green will appear the brightest of the three; 
the red will appear less bright, and the blue will be the 
darkest of the three. As a consequence of the luminous 
efficiency function, the most saturated blue colors are 
quite dark, and the most saturated yellows are quite 
light. 

If the luminance of a scene element is to be sensed by 
a scanner or camera having a single spectral filter, then 
the spectral response of the scanner's filter must - in 
theory, at least - correspond to the luminous efficiency 
function of Figure 20.1 . However, luminance can also be 
computed as a weighted sum of suitably chosen red, 
green, and blue tristimulus components. The coeffi- 
cients are functions of vision, of the white reference, 
and of the particular red, green, and blue spectral 
weighting functions employed. For realistic choices of 
white point and primaries, the green coefficient is quite 
large, the blue coefficient is the smallest of the three, 
and the red coefficient has an intermediate value. 



The primaries of contemporary CRT displays are stan- 
dardized in Rec. ITU-R BT.709. Weights computed from 
these primaries are appropriate to compute relative 
luminance from red, green, and blue tristimulus values 
for computer graphics, and for modern video cameras 
and modern CRT displays in both STDV and HDTV: 



My notation is outlined in 
Figure 24.5, on page 289. The 
coefficients are derived in Color 
science for video, on page 233. 



709 Y = 0.2126 R + 0 . 7152 C + 0.07226 Eq 20.1 

Luminance comprises roughly 21% power from the red 
(longwave) region of the spectrum, 72% from green 
(mediumwave), and 7% from blue (shortwave). 



To compute luminance using 
(R + C+B )/ 3 is at odds with the 
spectral response of vision. 



Blue has a small contribution to luminance. However, 
vision has excellent color discrimination among blue 
hues. If you give blue fewer bits than red or green, then 
blue areas of your images are liable to exhibit 
contouring artifacts. 
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Figure 20.2 Luminance and lightness. The dependence of lightness ( L *) or value ( V ) upon rela- 
tive luminance (Y) has been modeled by polynomials, power functions, and logarithms. In all of 
these systems, 18% "mid-gray" has lightness about halfway up the perceptual scale. For details, 
see Fig. 2 (6.3) in Wyszecki and Stiles, Color Science (cited on page 231). 



The L* symbol is pronounced 
EL-star. 



Lightness (CIE /.*) 

In Contrast sensitivity, on page 198, I explained that 
vision has a nonlinear perceptual response to lumi- 
nance. Vision scientists have proposed many functions 
that relate relative luminance to perceived lightness; 
several of these functions are graphed in Figure 20.2. 



In 1976, the CIE standardized the L* function to 
approximate the lightness response of human vision. 
Other functions - such as Munsell Value - specify alter- 
nate lightness scales, but the CIE L* function is widely 
used and internationally standardized. 



L* is a power function of relative luminance, modified 
by the introduction of a linear segment near black: 



L* = <1 



Y 

903.3—; 



— <0.008856 



116 



'Y' 

\ Y n J 



3 -16; 0.008856 < — 



Eq 20.2 



L* has a range of 0 to 100. Y is CIE luminance (propor- 
tional to intensity). Y n is the luminance of reference 
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To compute L* from optical 
density D in the range 0 to 2, 
use this relation: 

L* = 116-10“ D/3 -16 



A L* is pronounced delta 
EL-star. 



white. The quotient Y/Y n is relative luminance. (If you 
normalize luminance to unity, then you need not 
compute the quotient.) 

A linear segment is defined near black: For Y/Y n values 
0.008856 or less, L* is proportional to Y/Y n . The param- 
eters have been chosen such that the breakpoint occurs 
at an L* value of 8. This value corresponds to less than 
1% on the relative luminance scale! In a display system 
having a contrast ratio of 100:1, the entire reproduced 
image is confined to L* values between 8 and 100! The 
linear segment is important in color specification; 
however, Y/Y n values that small are rarely encountered 
in video. (If you don't use the linear segment, make 
sure that you prevent L* from ranging below zero.) 

The linear and power function segments are defined to 
maintain function and tangent continuity at the break- 
point between the two segments. The exponent of the 
power function segment is V3, but the scale factor of 
116 and the offset of -16 modify the pure power func- 
tion such that a 0.4-power function best approximates 
the overall curve. Roughly speaking, lightness is 100 
times the 0.4-power of relative luminance. 

The difference between two L* values, denoted A L*, is 
a measure of perceptual "distance." A difference of less 
than unity between two L* values is generally 
imperceptible - that is, A L* of unity is taken to lie at 
the threshold of discrimination. L* provides one compo- 
nent of a uniform color space. The term perceptually 
linear is not appropriate: Since we cannot directly 
measure the quantity in question, we cannot assign to 
it any strong properties of mathematical linearity. 

In Chapter 8, Constant luminance, I described how 
video systems encode a luma signal (Y') that is an engi- 
neering approximation to lightness. That signal is only 
indirectly related to the relative luminance (T) or the 
lightness (/.*) of color science. 
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The CIE system 
of colorimetry 



21 



Figure 21 .1 Example 
coordinate system 

V 




For an approachable, nonmathe- 
matical introduction to color 
physics and perception, see 
Rossotti, Hazel, Colour: Why the 
World Isn't Grey (Princeton, N.J.: 
Princeton Univ. Press, 1983). 



The Commission Internationale de L'Eclairage (CIE) has 
defined a system that maps a spectral power distribu- 
tion (SPD) of physics into a triple of numerical values - 
CIE XYZ tristimulus values - that form the math- 
ematical coordinates of color space. In this chapter, 

I describe the CIE system. In the following chapter, 

Color science for video, I will explain how these XYZ tris- 
timulus values are related to linear-light RCB values. 

Color coordinates are analogous to coordinates on 
a map (see Figure 21.1). Cartographers have different 
map projections for different functions: Some projec- 
tions preserve areas, others show latitudes and longi- 
tudes as straight lines. No single map projection fills all 
the needs of all map users. There are many "color 
spaces." As in maps, no single coordinate system fills all 
of the needs of users. 

In Chapter 20, Luminance and lightness, I introduced 
the linear-light quantity luminance. To reiterate, I use 
the term luminance and the symbol Y to refer to CIE 
luminance. I use the term luma and the symbol Y' to 
refer to the video component that conveys an approxi- 
mation to lightness. Most of the quantities in this 
chapter, and in the following chapter Color science for 
video, involve values that are proportional to intensity. 
In Chapter 8, Constant luminance, I related the theory of 
color science to the practice of video. To approximate 
perceptual uniformity, video uses quantities such as R', 
C, B', and Y' that are not proportional to intensity. 
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About 8% of men and 0.4% of 
women have deficient color vision, 
called color blindness. Some people 
have fewer than three types of 
cones; some people have cones with 
altered spectral sensitivities. 



Bill Schreiber points out that the 
words saturation and purity are 
often used interchangeably, to the 
dismay of purists. 



Fundamentals of vision 

As I explained in Retina, on page 195, human vision 
involves three types of color photoreceptor cone cells, 
which respond to incident radiation having wavelengths 
(A) from about 400 nm to 700 nm. The three cell types 
have different spectral responses; color is the percep- 
tual result of their absorption of light. Normal vision 
involves three types of cone cells, so three numerical 
values are necessary and sufficient to describe a color: 
Normal color vision is inherently trichromatic. 

Power distributions exist in the physical world; 
however, color exists only in the eye and the brain. 

Isaac Newton put it this way, in 1675: 

"Indeed rays, properly expressed, are not coloured." 

Definitions 

In Lightness terminology, on page 11,1 defined bright- 
ness, intensity, luminance, value, lightness, and tristim- 
ulus value. In Appendix B, Introduction to radiometry and 
photometry, on page 601, I give more rigorous defini- 
tions. In color science, it is important to use these terms 
carefully. It is especially important to differentiate phys- 
ical quantities (such as intensity and luminance), from 
perceptual quantities (such as lightness and value). 

Hue is the attribute of a visual sensation according to 
which an area appears to be similar to one of the 
perceived colors, red, yellow, green, and blue, or 
a combination of two of them. Roughly speaking, if the 
dominant wavelength of a spectral power distribution 
shifts, the hue of the associated color will shift. 

Saturation is the colorfulness of an area, judged in 
proportion to its brightness. Saturation is a perceptual 
quantity; like brightness, it cannot be measured. 

Purity is the ratio of the amount of a monochromatic 
stimulus to the amount of a specified achromatic stim- 
ulus which, when mixed additively, matches the color in 
question. Purity is the objective correlate of saturation. 
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Wavelength, nm 






Figure 21 .2 Spectral and tristimulus color 
reproduction. A color can be represented as 
a spectral power distribution (SPD), perhaps in 
31 components representing power in 10 nm 
bands over the range 400 nm to 700 nm. 
Flowever, owing to the trichromatic nature of 
human vision, if appropriate spectral weighting 
functions are used, three components suffice to 
represent color. The SPD shown here is the 
CIE Dg 5 daylight illuminant. 

I Spectral reproduction (31 components) 

" llWfillllllTl 



? 



3 



Tristimulus reproduction (3 components) 

1 1 



The more an SPD is concentrated 
near one wavelength, the more 
saturated the associated color will 
be. A color can be desaturated by 
adding light with power at all 
wavelengths. 



Spectral power distribution (SPD) and tristimulus 

The physical wavelength composition of light is 
expressed in a spectra! power distribution (SPD, or spec- 
tral radiance). An SPD representative of daylight is 
graphed at the upper left of Figure 21.2 above. 

One way to reproduce a color is to directly reproduce 
its spectral power distribution. This approach, termed 
spectral reproduction, is suitable for reproducing a single 
color or a few colors. For example, the visible range of 
wavelengths from 400 nm to 700 nm could be divided 
into 31 bands, each 10 nm wide. Flowever, using 
31 components for each pixel is an impractical way to 
code an image. Owing to the trichromatic nature of 
vision, if suitable spectral weighting functions are used, 
any color can be described by just three components. 
This is called tristimulus reproduction. 



Strictly speaking, colorimetry refers 
to the measurement of color. In 
video, colorimetry is taken to 
encompass the transfer functions 
used to code linear RGB to R'G'B' , 
and the matrix that produces 
luma and color difference signals. 



The science of colorimetry concerns the relationship 
between SPDs and color. In 1931, the Commission 
Internationale de L'Eclairage (CIE) standardized 
weighting curves for a hypothetical Standard Observer. 
These curves - graphed in Figure 21.4, on page 216 - 
specify how an SPD can be transformed into three 
tristimulus values that specify a color. 
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Pronounced mehta-MAIR-ik 
and meh-TAM-er-ism. 



For a textbook lowpass filter, 
see Figure 16.23 on page 162. 



To specify a color, it is not necessary to specify its spec- 
trum - it suffices to specify its tristimulus values. To 
reproduce a color, its spectrum need not be repro- 
duced - it suffices to reproduce its tristimulus values. 
This is known as a metameric match. Metamerism is the 
property of two spectrally different stimuli having the 
same tristimulus values. 

The colors produced in reflective systems - such as 
photography, printing, or paint - depend not only upon 
the colorants and the substrate (media), but also on the 
SPD of the illumination. To guarantee that two colored 
materials will match under illuminants having different 
SPDs, you may have to achieve a spectral match. 

Scanner spectral constraints 

The relationship between spectral distributions and the 
three components of a color value is usually explained 
starting from the famous color-matching experiment. 

I will instead explain the relationship by illustrating the 
practical concerns of engineering the spectral filters 
required by a color scanner or camera, using Figure 21.3 
opposite. 

The top row shows the spectral sensitivity of three 
wideband optical filters having uniform response across 
each of the longwave, mediumwave, and shortwave 
regions of the spectrum. Most filters, whether for elec- 
trical signals or for optical power, are designed to have 
responses as uniform as possible across the passband, 
to have transition zones as narrow as possible, and to 
have maximum possible attenuation in the stopbands. 

At the top left of Figure 21.3, I show two monochro- 
matic sources, which appear saturated orange and red, 
analyzed by "textbook" bandpass filters. These two 
different wavelength distributions, which are seen as 
different colors, report the identical RGB triple [1, 0, 0]. 
The two SPDs are perceived as having different colors; 
however, this filter set reports identical RGB values. The 
wideband filter set senses color incorrectly. 
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Figure 21 .3 Spectral constraints are associated with scanners and cameras. 1. Wideband filter 
set of the top row shows the spectral sensitivity of filters having uniform response across the 
shortwave, mediumwave, and longwave regions of the spectrum. Two monochromatic sources 
seen by the eye to have different colors - in this case, a saturated orange and a saturated red - 
cannot be distinguished by the filter set. 2. Narrowband filter set in the middle row solves that 
problem, but creates another: Many monochromatic sources "fall between" the filters, and are 
sensed as black. To see color as the eye does, the filter responses must closely relate to the color 
response of the eye. 3. CIE-based filter set in the bottom row shows the color-matching func- 
tions (CM Fs) of the CIE Standard Observer. 



At first glance it may seem that the problem with the 
wideband filters is insufficient wavelength discrimina- 
tion. The middle row of the example attempts to solve 
that problem by using three narrowband filters. The 
narrowband set solves one problem, but creates 
another: Many monochromatic sources "fall between" 
the filters. Here, the orange source reports an RGB 
triple of [0, 0, 0], identical to the result of scanning 
black. 

Although my example is contrived, the problem is not. 
Ultimately, the test of whether a camera or scanner is 
successful is whether it reports distinct RGB triples if 
and only if human vision sees two SPDs as being 
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Figure 21 .4 CIE 1931, 2° color-matching functions. A camera with 3 sensors must have these 
spectral response curves, or linear combinations of them, in order to capture all colors. However, 
practical considerations make this difficult. These analysis functions ar e not comparable to spec- 
tral power distributions! 



CIE N° 15.2, Colorimetry, Second 
Edition (Vienna, Austria: Commis- 
sion Internationale de L' Eclai rage, 
1986); reprinted with corrections 
in 1996. 

In CIE N° 15.2, color matching 
functions are denoted x(A), y(A), 
and z(A) [pronounced ECKS-bar, 
WYE-bar, ZEE-bar ]. CIE N° 15.3 is 
in draft status, and I have 
adopted its new notation X(A) , 
Y(A), and Z(A). 

Some authors refer to CMFs 
as color mixture curves, or CMCs. 
That usage is best avoided, 
because CMC denotes a particular 
color difference formula defined 
in British Standard BS:6923. 
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different colors. For a scanner or a camera to see color 
as the eye does, the filter sensitivity curves must be 
intimately related to the response of human vision. 

The famous "color-matching experiment" was devised 
during the 1920s to characterize the relationship 
between physical spectra and perceived color. The exper- 
iment measures mixtures of different spectral distribu- 
tions that are required for human observers to match 
colors. From statistics obtained from experiments 
involving observers participating in these experiments, in 
1931 the CIE standardized a set of spectral weighting 
functions that models the perception of color. 

These curves are called the X(A), Y(A), and Z(A) color- 
matching functions (CMFs) for the CIE Standard 
Observer. They are illustrated at the bottom of 
Figure 21.3, and are graphed at a larger scale in 
Figure 21.4 above. They are defined numerically; they 
are everywhere nonnegative. 

DIGITAL VIDEO AND HDTV ALGORITHMS AND INTERFACES 



The term sharpening is used in the 
color science community to 
describe certain 3x3 transforms 
of cone fundamentals. This termi- 
nology is unfortunate, because in 
image science, sharpening refers 
to spatial phenomena. 



X, Y, and Z are pronounced big-X, 
big-Y, and big-Z, or cap-X, cap-Y, 
and cap-Z, to distinguish them 
from little x and little y, to be 
described in a moment. 
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The CIE 1931 functions are appropriate to estimate the 
visual response to stimuli subtending angles of about 2° 
at the eye. In 1964, the CIE standardized a set of CM Fs 
suitable for stimuli subtending about 10°; this set is 
generally unsuitable for image reproduction. 

The functions of the CIE Standard Observer were stan- 
dardized based upon experiments with visual color 
matching. Research since then revealed the spectral 
absorbance of the three types of cone cells - the cone 
fundamentals. We would expect the CIE CMFs to be 
intimately related to the properties of the retinal photo- 
receptors; many experimenters have related the cone 
fundamentals to CIE tristimulus values through 3x3 
linear matrix transforms. None of the proposed 
mappings is very accurate, apparently owing to the 
intervention of high-level visual processing. For engi- 
neering purposes, the CIE functions suffice. 

The Y(X) and Z(A) CMFs each have one peak - they are 
"unimodal." However, theA"(A) CMF has a secondary 
peak, between 400 nm and 500 nm. This does not 
directly reflect any property of the retinal response; 
instead, it is a consequence of the mathematical 
process by which the A" (A), A (A), and Z(A) curves are 
constructed. 

CIE XYZ tristimulus 

Weighting an SPD under the T(A) color-matching func- 
tion yields luminance (symbol Y), as I described on 
page 205. When luminance is augmented with two 
other values, computed in the same manner as lumi- 
nance but using theX(A) and Z(A) color-matching func- 
tions, the resulting values are known as XYZ tristimulus 
values (denoted X, Y, and Z). XYZ values correlate to 
the spectral sensitivity of human vision. Their ampli- 
tudes - always positive - are proportional to intensity. 

Tristimulus values are computed from a continuous SPD 
by integrating the SPD under the A" (A), T(A), and Z(A) 
co lor- matching functions. In discrete form, tristimulus 
values are computed by a matrix multiplication, as illus- 
trated in Figure 21.5 overleaf. 
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0.0143 0.0004 0.0679" 




" 82.75" 


400 nm 


0.0435 0.0012 0.2074 


• 


91.49 




0.1344 0.0040 0.6456 




93.43 




0.2839 0.0116 1.3856 




86.68 




0.3483 0.0230 1.7471 




104.86 




0.3362 0.0380 1.7721 




117.01 


450 nm 


0.2908 0.0600 1.6692 




117.81 




0.1954 0.0910 1.2876 




114.86 




0.0956 0.1390 0.8130 




115.92 




0.0320 0.2080 0.4652 




108.81 




0.0049 0.3230 0.2720 




109.35 


500 nm 


0.0093 0.5030 0.1582 




107.80 




0.0633 0.7100 0.0782 




104.79 




0.1655 0.8620 0.0422 




107.69 




0.2904 0.9540 0.0203 




104.41 




0.4334 0.9950 0.0087 




104.05 


550 nm 


0.5945 0.9950 0.0039 




100.00 




0.7621 0.9520 0.0021 




96.33 




0.9163 0.8700 0.0017 




95.79 




1.0263 0.7570 0.0011 




88.69 




1.0622 0.6310 0.0008 




90.01 


600 nm 


1.0026 0.5030 0.0003 




89.60 




0.8544 0.3810 0.0002 




87.70 




0.6424 0.2650 0.0000 




83.29 




0.4479 0.1750 0.0000 




83.70 




0.2835 0.1070 0.0000 




80.03 


650 nm 


0.1649 0.0610 0.0000 




80.21 




0.0874 0.0320 0.0000 




82.28 




0.0468 0.0170 0.0000 




78.28 




0.0227 0.0082 0.0000 




69.72 




0.0114 0.0041 0.0000 




71.61 


700 nm 



Figure 21 .5 Calculation of tristimulus values by matrix multipli- 
cation starts with a column vector representing the SPD. The 
31-element column vector in this example is a discrete version of 
CIE llluminant D 65 , at 10 nm intervals. The SPD is matrix-multi- 
plied by a discrete version of the CIE X(A), T(A), and Z(A) color- 
matching functions of Figure 21.4, here in a 31x3 matrix. The 
superscript T denotes the matrix transpose operation. The result of 
the matrix multiplication is a set of XYZ tristimulus components. 
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Grassmann's Third Law: 

Sources of the same color 
produce Identical effects in an 
additive mixture regardless of 
their spectral composition. 



Thornton, William A., “Spectral 
Sensitivities of the Normal 
Human Visual System, Color- 
Matching Functions and Their 
Principles, and How and Why 
the Two Sets Should Coincide," 
in Color Research and Application 
24 (2): 139-156 (April 1999). 



The x and y symbols are 
pronounced little-x and little-y. 



Human color vision follows a principle of superposition 
known as Grassmann's Third Law: The tristimulus values 
computed from the sum of a set of SPDs is identical to 
the sum of the tristimulus values of each SPD. Due to 
this linearity of additive color mixture, any set of three 
components that is a nontrivial linear combination of X, 
Y, and Z - such as R, G, and B - is also a set of tristim- 
ulus values. (In Transformations between RGB and 
CIE XYZ, on page 251, I will introduce related CM Fs 
that produce R, G, and B tristimulus values.) 

This chapter accepts the CIE Standard Observer rather 
uncritically. Although the CIE Standard Observer is very 
useful and widely used, some researchers believe that it 
exhibits some problems and ought to be improved. For 
one well-informed and provocative view, see Thornton. 

CIE [x,y] chromaticity 

It is convenient, for both conceptual understanding and 
for computation, to have a representation of “pure" 
color in the absence of lightness. The CIE standardized 
a procedure for normalizing XYZ tristimulus values to 
obtain two chromaticity values x and y. 



Chromaticity values are computed by this projective 
transformation: 



x = - 



X 



X + Y + Z 



Y 



X + Y + Z 



Eq 21.1 



A third chromaticity coordinate, z, is defined, but is 
redundant since x +y + z = 1. The x and y chromaticity 
coordinates are abstract values that have no direct 
physical interpretation. 



A color can be specified by its chromaticity and lumi- 
nance, in the form of an xyY triple. To recover X and Z 
tristimulus values from [x, y] chromaticities and lumi- 
nance, use the inverse of Equation 21.1 : 






y 



Eq 21.2 



A color plots as a point in an [x, y] chromaticity 
diagram, plotted in Figure 21.6 overleaf. 
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Figure 21 .6 CIE 1931 2° [x, y] chromaticity diagram. The spectral locus is a shark-fin-shaped 
path swept by a monochromatic source as it is tuned from 400 nm to 700 nm. The set of all 
colors is closed by the line of purples, which traces SPDs that combine longwave and shortwave 
power but have no mediumwave power. All colors lie within the shark-fin-shaped region: Points 
outside this region are not colors. 

This diagram is not a slice through [X, Y, Z ] space! Instead, points in [X, Y, Z ] project onto the 
plane of the diagram in a manner comparable to the perspective projection. White has [X, Y, Z] 
values near [1, 1, 1]; it projects to a point near the center of the diagram, in the region of fl/3, 
V3]. Attempting to project black, at [0, 0, 0], would require dividing by zero: Black has no place 
in this diagram. 
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Blackbody 

locus 



White 

point 



Figure 21.7 CIE [x, y] 

chart features 



In Figure 21.7 in the margin, I sketch several features of 
the [x, y] diagram. The important features lie on, or 
below and to the left, of the line j/ = 1 -x. 

When a narrowband (monochromatic) SPD comprising 
power at just one wavelength is swept across the range 
400 nm to 700 nm, it traces the inverted-U-shaped 
spectral locus in [x, y] coordinates. 

The sensation of purple cannot be produced by a single 
wavelength; it requires a mixture of shortwave and 
longwave light. The line of purples on a chromaticity 
diagram joins the chromaticity of extreme blue (violet), 
containing only shortwave power, to the chromaticity of 
extreme red, containing only longwave power. 

There is no unique physical or perceptual definition of 
white. Many important sources of illumination are 
blackbody radiators, whose chromaticity coordinates lie 
on the blackbody locus (sometimes called the Plankian 
locus). The SPDs of blackbody radiators will be 
discussed in the next section. 

An SPD that appears white has CIE [X, Y, Z] values of 
about [1, 1, 1], and [x,y] coordinates in the region of 
[V3, V3] : White plots in the central area of the chroma- 
ticity diagram. In the section White, on page 223, I will 
describe the SPDs associated with white. 

Any all-positive ( physical , or realizable) SPD plots as 
a single point in the chromaticity diagram, within the 
region bounded by the spectral locus and the line of 
purples. All colors lie within this region; points outside 
this region are not associated with colors. It is silly to 
qualify "color" by "visible," because color is itself 
defined by vision - if it's invisible, it's not a color! 

In the projective transformation that forms x and y, any 
additive mixture (linear combination) of two SPDs - or 
two tristimulus values - plots on a straight line in the 
[x,y] plane. However, distances are not preserved, so 
chromaticity values do not combine linearly. Neither 
[X, Y, Z] nor [x, y] coordinates are perceptually uniform. 
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Figure 21 .8 SPDs of 
blackbody radiators at 

several temperatures are 
graphed here. As the 
temperature increases, 
the absolute power 
increases and the peak 
of the spectral distribu- 
tion shifts toward 
shorter wavelengths. 




Wavelength, nm 



Blackbody radiation 

Max Planck determined that the SPD radiated from 
a hot object - a blackbody radiator - is a function of the 
temperature to which the object is heated. Figure 21.8 
above shows the SPDs of blackbody radiators at several 
temperatures. As temperature increases, the absolute 
power increases and the spectral peak shifts toward 
shorter wavelengths. If the power of blackbody radia- 
tors is normalized at an arbitrary wavelength, dramatic 
differences in spectral character become evident, as 
illustrated in Figure 21.9 opposite. 

Many sources of illumination have, at their core, 

The symbol for Kelvin is properly a heated object, so it is useful to characterize an illumi- 

written k (with no degree sign). nant by specifying the absolute temperature (in units of 

kelvin, l<) of a blackbody radiator having the same hue. 

The blackbody locus is the path traced in [x, y] coordi- 
nates as the temperature of a blackbody source is 
raised. At low temperature, the source appears red 
("red hot"). When a viewer is adapted to a white refer- 
ence of CIE D 65 , which I will describe in a moment, at 
about 2000 K, the source appears orange. Near 4000 K, 
it appears yellow; at about 6000 K, white. Above 
10000 K, it is blue-hot. 
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Figure 21 .9 SPDs of 
blackbody radiators, 
normalized to equal 
power at 555 nm, are 
graphed here. The dramat- 
ically different spectral 
character of blackbody 
radiators at different 
temperatures is evident. 



The 1960 [u, v] coordinates are 
described in the marginal note 
on page 226. 




Wavelength, nm 



Color temperature 

An illuminant may be specified by a single color temp- 
erature number, also known as correlated color tempera- 
ture (CCT). However, it takes two numbers to specify 
chromaticity! To address this deficiency, color tempera- 
ture is sometimes augmented by a second number 
giving the closest distance in the deprecated CIE 1960 
[u, v] coordinates of the color from the blackbody 
locus - the arcane "minimum perceptible color differ- 
ence" (MPCD) units. It is more sensible to directly 
specify [x, y] or [u' r v'] chromaticity coordinates. 

White 

As I mentioned a moment ago, there is no unique defi- 
nition of white: To achieve accurate color, you must 
specify the SPD or the chromaticity of white. In addi- 
tive mixture, to be detailed on page 234, the white 
point is the set of tristimulus values (or the luminance 
and chromaticity coordinates) of the color reproduced 
by equal contributions of the red, green, and blue 
primaries. The color of white is a function of the ratio - 
or balance - of power among the primary components. 
(In subtractive reproduction, the color of white is deter- 
mined by the SPD of the illumination, multiplied by the 
SPD of the uncolored media.) 
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Figure 21 .10 CIE illuminants are graphed here. Illuminant A is an obsolete standard represen- 
tative of tungsten illumination; its SPD resembles the blackbody radiator at 3200 l< shown in 
Figure 21.9, on page 223. Illuminant C was an early standard for daylight; it too is obsolete. 
The family of D illuminants represents daylight at several color temperatures. 



It is sometimes convenient for purposes of calculation 
to define white as an SPD whose power is uniform 
throughout the visible spectrum. This white reference is 
known as the equal-energy illuminant, denoted CIE Illu- 
minant E; its CIE [x, y] coordinates are [V3, V3]. 



The CIE D illuminants are prop- 
erly denoted with a two-digit 
subscript. CIE Illuminant D 65 has 
a correlated color temperature of 
about 6504 K. 



A more realistic reference, approximating daylight, has 
been numerically specified by the CIE as Illuminant D 65 . 
You should use this unless you have a good reason to 
use something else. The print industry commonly uses 
D 50 and photography commonly uses D 55 ; these repre- 
sent compromises between the conditions of indoor 
(tungsten) and daylight viewing. Figure 21.10 above 
shows the SPDs of several standard illuminants; chro- 
maticity coordinates are given in Table 21.1 opposite. 



Concerning 9300 k, Many computer monitors and many consumer televi- 

see page 254. S j on receivers have a default color temperature setting 

of 9300 K. That white reference contains too much blue 
to achieve acceptable image reproduction in Europe or 
America. However, there is a cultural preference in Asia 
for a more bluish reproduction than D 65 ; 9300 l< is 
common in Asia (e.g., in studio monitors in Japan). 
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Tungsten illumination can't have 
a color temperature higher than 
tungsten's melting point, 3680 K. 



Human vision adapts to the viewing environment. An 
image viewed in isolation - such as a 35 mm slide, or 
motion picture film projected in a dark room - creates 
its own white reference; a viewer will be quite tolerant 
of variation in white point. However, if the same image 
is viewed alongside an external white reference, or with 
a second image, differences in white point can be 
objectionable. Complete adaptation seems to be 
confined to color temperatures from about 5000 l< to 
6500 K. Tungsten illumination, at about 3200 K, almost 
always appears somewhat yellow. 



Table 21.1 enumerates the chromaticity coordinates of 
several common white references: 



Notation 



CIE III. A (obsolete) 


0.4476 


CIE III. B (obsolete) 


0.3484 


CIE III. C (obsolete) 


0.3101 


CIE III. D 50 


0.3457 


CIE III. D 55 


0.3325 


CIE III. D 65 


0.312727 


CIE III. E (equi-energy) 


0.333334 



0.4074 


0.1450 


0.2560 


0.5243 


0.3516 


0.3000 


0.2137 


0.4852 


0.3162 


0.3737 


0.2009 


0.4609 


0.3587 


0.2956 


0.2091 


0.4882 


0.3476 


0.3199 


0.2044 


0.4801 


0.329024 


0.358250 


0.1978 


0.4683 


0.333330 


0.333336 


0.2105 


0.4737 


0.298 


0.419 


0.1884 


0.4463 



9300 l< (discouraged, but used 0.283 
in studio standards in Japan) 

Table 21.1 White references 



Perceptually uniform color spaces 

As I outlined in Perceptual uniformity, on page 21, 
a system is perceptually uniform if a small perturbation 
to a component value is approximately equally percep- 
tible across the range of that value. 



Luminance is not perceptually uniform. On page 208, 

I described how luminance can be transformed to light- 
ness, denoted L*, which is nearly perceptually uniform: 



L* = 



903.3—; —<0.008856 



116 



f Y_' 



3 -16; 0.008856 < — 



Eq 21.3 
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L*u*v* and L*a*b* are sometimes 
written CIELUV and CIELAB; they 
are pronounced SEA-love and 
SEA-lab. The u* and v* quantities 
of color science - and the o' and 
v' quantities, to be described - 
are unrelated to the U and V color 
difference components of video. 



The primes in the CIE 1 976 o' and 
v' quantities denote the successor 
to the obsolete 1960 CIE o and v 
quantities. u = u'\ v = ^/jv'. The 
primes are not formally related to 
the primes in R', O', B', and Y', 
though all imply some degree of 
perceptual uniformity. 



Extending this concept to color, XYZ and RGB tristim- 
ulus values, and xyY (chromaticity and luminance), are 
far from perceptually uniform. Finding a transformation 
of XYZ into a reasonably perceptually uniform space 
occupied the CIE for a decade, and in the end no single 
system could be agreed upon. In 1976, the CIE stan- 
dardized two systems, L*u*v* and L*a*b*, which I will 
now describe. 



CIE L*u*v* 

Computation of CIE L*u*v* involves a projective trans- 
formation of [X, Y, Z] into intermediate u' and v' 
quantities: 

4 X 9Y 

u'= ; v'= Eq21.4 

X + 15T + 3Z X + 15 Y + 3 Z 

Equivalently, u' and v' can be computed from x and y 
chromaticity: 



3-2x+12y' 3-2x+12y 

Since u' and v' are formed by a projective transforma- 
tion, u' and v' coordinates are associated with 
a chromaticity diagram similar to the CIE 1931 2° [x, y] 
chromaticity diagram on page 220. You should use the 
[u', v'] diagram if your plots are intended to be sugges- 
tive of perceptible differences. 



To recover X and Z tristimulus values from u' and v', use 
these relations: 



X = ^Y; 
4v' 



12-3u'-20v' y 
4v‘ ' 



Eq 21.6 



To recover x and y chromaticity from u' and v', use 
these relations: 



9u‘ 4v' _ _ 

x = ; y = Eq21.7 

6u'-16v'+12 6u‘- 16v'+12 

To compute u* and v* , first compute L*. Then compute 
u' n and v' n from your reference white X n , Y n , and Z n . 
(The subscript n suggests normalized .) The u' n and v' n 
coordinates for several common white points are given 
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AE* is pronounced 
DELTA E-star. 



in Table 21.1, White references, on page 225. Finally, 
compute u* and v*: 

u* = 13i*(o'-u' n ); v* = 1 3 L* (v 1 —v' n ^ Eq21.8 

For gamuts typical of image reproduction, each u* and 
v* value ranges approximately ±100. 



Euclidean distance in L*u*v* - denoted A£* v - is taken 
to measure perceptibility of color differences: 




If A £uv ' s unity or less, the color difference is taken to 
be imperceptible. Ffowever, L*u*v* does not achieve 
perceptual uniformity, it is merely an approximation. 

A values between about 1 and 4 may or may not be 
perceptible, depending upon the region of colorspace 
being examined. A£* v values greater than 4 are likely to 
be perceptible; whether such differences are objection- 
able depends upon circumstances. 



A polar-coordinate version of the [u*, v*] pair can be 
used to express chroma and hue: 

C* v =slu * 2 +v* 2 ; h uy = tan -1 Eq 21.10 

In addition, there is a "psychometric saturation" term: 




Chroma, hue, and saturation defined here are not 
directly related to saturation and hue in the HSB, HSI, 
HSL, HSV, and IHS systems used in computing and in 
digital image processing: Most of the published descrip- 
tions of these spaces, and most of the published 
formulae, disregard the principles of color science. In 
particular, the quantities called lightness and value are 
wildly inconsistent with their definitions in color 
science. 
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CIE L*a*b* 



Eq 21.12 



Eq 21.13 



Eq 21.14 



Eq 21.15 



Providing that all of X/X n , Y/Y n , and Z/Z n are greater 
than 0.008856, a* and b * are computed as follows: 



i i 

— r ,, \- 





3 


>■ 


3 


UnJ 









b * = 200 



1 1 

— / v \- 



("I 


3 


' z2 


UJ 




j 



As in the /.* definition, the transfer function incorpo- 
rates a linear segment. For any quantity X/X n , Y/Y n , or 
Z/Z n that is 0.008856 or smaller, denote that quantity t, 
and instead of the cube root, use this quantity: 



7.787f + — 

116 

For details, consult CIE Publication N° 15.2, cited in the 
margin of page 216. 



As in L*u*v*, one unit of Euclidean distance in L*a*b* - 
denoted A E*^ - approximates the perceptibility of color 
differences: 




If A £!|b is unity or less, the color difference is taken to 
be imperceptible. Plowever, L*a*b* does not achieve 
perceptual uniformity: It is merely an approximation. 



A polar-coordinate version of the [a*, b*] pair can be 
used to express chroma and hue: 

Cab = Va* 2 +b * 2 ; /) ab = tan -1 

a 

The equations that form a* and b* coordinates are not 
projective transformations; straight lines in [x,y] do not 
transform to straight lines in [a*, b*]. [a*, b*] coordi- 
nates can be plotted in two dimensions, but such a plot 
is not a chromaticity diagram. 

CIE L*u* v* and CIE L*a*b* summary 

Both L*u*v* and L*a*b* improve the 80:1 or so percep- 
tual nonuniformity of XYZ to about 6:1. Both systems 
transform tristimulus values into a lightness component 
ranging from 0 to 100, and two color components 
ranging approximately ±100. One unit of Euclidean 
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McCamy argues that under 
normal conditions 1,875,000 
colors can be distinguished. See 
McCamy, C.S.,“On the Number 
of Discernable Colors," in Color 
Research and Application, 23 (5): 
337 (Oct. 1998). 



ITU-T Rec. T.42, Continuous-tone 
colour representation for facsimile 
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distance in L*u*v* or L*a*b* corresponds roughly to 
a just-noticeable difference (JND) of color. 

Consider that L* ranges 0 to 1 00, and each of u* and v* 
range approximately ±100. A threshold of unity A£* v 
defines four million colors. About one million colors can 
be distinguished by vision, so CIE L*u*v* is somewhat 
conservative. A million colors - or even the four million 
colors identified using a A£* v or AF* b threshold of 
unity - are well within the capacity of the 16.7 million 
colors available in a 24-bit truecolor system that uses 
perceptually appropriate transfer functions, such as the 
function of Rec. 709. (However, 24 bits per pixel are far 
short of the number required for adequate perfor- 
mance with linear-light coding.) 

The L*u*v* or L*a*b * systems are most useful in color 
specification. Both systems demand too much compu- 
tation for economical realtime video processing, 
although both have been successfully applied to still 
image coding, particularly for printing. The complexity 
of the CIE L*u*v* and CIE L*a*b * calculations makes 
these systems generally unsuitable for image coding. 

The nonlinear R'G'B' coding used in video is quite 
perceptually uniform, and has the advantage of being 
suitable for realtime processing. Keep in mind that 
R'G'B’ typically incorporates significant gamut limita- 
tion, whereas L*u*v* and CIE L*a*b* represent all colors. 
L*a*b* is sometimes used in desktop graphics with 
[a*, b*] coordinates ranging from -128 to +127 (e.g., 
Photoshop). The ITU-T Rec. T.42 standard for color fax 
accommodates L*a*b* coding with a* ranging “85 to 85, 
and b * ranging “75 to 125. Even with these restric- 
tions, CIE L*a*b * covers nearly all of the colors. 

Color specification 

A color specification system needs to be able to repre- 
sent any color with high precision. Since few colors are 
handled at a time, a specification system can be compu- 
tationally complex. A system for color specification 
must be intimately related to the CIE system. 

The systems useful for color specification are CIE XYZ 
and its derivatives xyY, L*u*v*, and L*a*b*. 
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Figure 21 .1 1 Color systems are classified into four groups that are related by different kinds of 
transformations. Tristimulus systems, and perceptually uniform systems, are useful for image 
coding. (I flag HSB, HSI, HSL, HSV, and IHS with a question mark: These systems lack objective 
definition of color.) 



Color image coding 

A color image is represented as an array of pixels, where 
each pixel contains three values that define a color. As 
you have learned in this chapter, three components are 
necessary and sufficient to define any color. (In printing 
it is convenient to add a fourth, black, component, 
giving CMYK.) 

In theory, the three numerical values for image coding 
could be provided by a color specification system. 
However, a practical image coding system needs to be 
computationally efficient, cannot afford unlimited preci- 
sion, need not be intimately related to the CIE system, 
and generally needs to cover only a reasonably wide 
range of colors and not all possible colors. So image 
coding uses different systems than color specification. 

The systems useful for image coding are linear RGB; 
nonlinear RGB (usually denoted R'G'B', and including 
sRGB); nonlinear GMY; nonlinear CMYK; and deriva- 
tives of R'G’B', such as Y'C B C R and Y'P B P R . These are 
summarized in Figure 21.11. 
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Wyszecki, Gunter, and W. S. 

Styles, Color Science: Concepts and 
Methods, Quantitative Data and 
Formulae, Second Edition (New 
York: Wiley, 1982). 

Judd, Deane B., and Gunter 
Wyszecki, Color in Business, 
Science, and Industry, Third Edition 
(New York: Wiley, 1975). 

Hunt, R.W.G.,7/ie Reproduction of 
Colour in Photography, Printing & 
Television, Fifth Edition (Tolworth, 
England: Fountain Press, 1995). 



If you manufacture cars, you have to match the paint on 
the door with the paint on the fender; color specifica- 
tion will be necessary. You can afford quite a bit of 
computation, because there are only two colored 
elements, the door and the fender. To convey a picture 
of the car, you may have a million colored elements or 
more: Computation must be quite efficient, and an 
image coding system is called for. 

Further reading 

The bible of colorimetry is Color Science, by Wyszecki 
and Styles. But it's daunting. For a condensed version, 
read Judd and Wyszecki 's Color in Business, Science, and 
Industry. It is directed to the color industry: ink, paint, 
and the like. 

For an approachable introduction to color theory, 
accompanied by practical descriptions of image repro- 
duction, consult Flunt's classic work. 
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Classical color science, explained in the previous 
chapter, establishes the basis for numerical description 
of color. But color science is intended for the specifica- 
tion of color, not for image coding. Although an under- 
standing of color science is necessary to achieve good 
color performance in video, its strict application is 
impractical. This chapter explains the engineering 
compromises necessary to make practical cameras and 
practical coding systems. 

Video processing is generally concerned with color 
represented in three components derived from the 
scene, usually red, green, and blue, or components 
computed from these. Accurate color reproduction 
depends on knowing exactly how the physical spectra 
of the original scene are transformed into these compo- 
nents, and exactly how the components are trans- 
formed to physical spectra at the display. These issues 
are the subject of this chapter. 

Once red, green, and blue components of a scene are 
obtained, these components are transformed into other 
forms optimized for processing, recording, and trans- 
mission. This will be discussed in Component video color 
coding for SDTV, on page 301, and Component video 
color coding for HDTV, on page 313. (Unfortunately, 
color coding differs between SDTV and HDTV.) 

The previous chapter explained how to analyze SPDs of 
scene elements into XYZ tristimulus values representing 
color. The obvious way to reproduce those colors is to 
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If you are unfamiliar with the term 
luminance, or the symbols Y or Y', 
refer to Luminance and lightness, 
on page 203. 



arrange for the reproduction system to reproduce those 
XYZ values. That approach works in many applications 
of color reproduction, and it's the basis for color in 
video. However, in image reproduction, direct recre- 
ation of the XYZ values is unsuitable for perceptual 
reasons. Some modifications are necessary to achieve 
subjectively acceptable results. Those modifications 
were described in Constant luminance, on page 75. 

Should you wish to skip this chapter, remember that 
accurate description of colors expressed in terms of RGB 
coordinates depends on the characterization of the RGB 
primaries and their power ratios (white reference). If 
your system is standardized to use a fixed set of prima- 
ries throughout, you need not be concerned about this; 
however, if your images use different primary sets, it is 
a vital issue. 

Additive reproduction (RGB) 

In the previous chapter, I explained how a physical SPD 
can be analyzed into three components that represent 
color. This section explains how those components can 
be mixed to reproduce color. 

The simplest way to reproduce a range of colors is to 
mix the beams from three lights of different colors, as 
sketched in Figure 22.1 opposite. In physical terms, the 
spectra from each of the lights add together wave- 
length by wavelength to form the spectrum of the 
mixture. Physically and mathematically, the spectra add: 
The process is called additive reproduction. 

I described Grassmann's Third Law on page 219: Color 
vision obeys a principle of superposition, whereby the 
color produced by any additive mixture of three primary 
SPDs can be predicted by adding the corresponding 
fractions ofth e XYZ tristimulus components of the 
primaries. The colors that can be formed from 
a particular set of RGB primaries are completely deter- 
mined by the colors - tristimulus values, or luminance 
values and chromaticity coordinates - of the individual 
primaries. Subtractive reproduction, used in photog- 
raphy, cinema film, and commercial printing, is much 
more complicated: Colors in subtractive mixtures are 
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Figure 22.1 Additive reproduction. This diagram illustrates the physical process underlying 
additive color mixture, as is used in video. Each primary has an independent, direct path to the 
image. The spectral power of the image is the sum of the spectra of the primaries. The colors of 
the mixtures are completely determined by the colors of the primaries; analysis and prediction 
of mixtures is reasonably simple. The SPDs shown here are those of a Sony Trinitron monitor. 



not determined by the colors of the individual prima- 
ries, but by their spectral properties. 

Additive reproduction is employed directly in a video 
projector, where the spectra from a red beam, a green 
beam, and a blue beam are physically summed at the 
surface of the projection screen. Additive reproduction 
is also employed in a direct-view color CRT, but through 
slightly indirect means. The screen of a CRT comprises 
small phosphor dots (triads) that, when illuminated by 
their respective electron beams, produce red, green, 
and blue light. When the screen is viewed from 
a sufficient distance, the spectra of these dots add at 
the retina of the observer. 

The widest range of colors will be produced with prima- 
ries that individually appear red, green, and blue. When 
color displays were exclusively CRTs, RGB systems were 
characterized by the chromaticities of their phosphors. 
To encompass newer devices that form colors without 
using phosphors, we refer to primary chromaticities 
rather than phosphor chromaticities. 
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Characterization of RGB primaries 

An additive RGB system is specified by the chromatici- 
ties of its primaries and its white point. The extent - or 
gamut - of the colors that can be mixed from a given 
set of RGB primaries is given in the [x, y] chromaticity 
diagram by a triangle whose vertices are the chromatici- 
ties of the primaries. Figure 22.2 opposite plots the 
primaries of several contemporary video standards that 
I will describe. 



CIE standards established in 
1964 were based upon mono- 
chromatic primaries at 444.4, 
526.3, and 645.2 nm. 



In computing there are no standard primaries or white 
point chromaticities, though the sRGB standard is 
becoming increasingly widely used. (I will describe 
sRGB below, along with Rec. 709.) If you have RGB 
image but have no information about its primary chro- 
maticities, you cannot accurately reproduce the image. 

CIE RGB primaries 

Color science researchers in the 1920s used monochro- 
matic primaries - that is, primaries whose chromaticity 
coordinates lie on the spectral locus. The particular 
primaries that led to the CIE standard in 1931 became 
known as the CIE primaries; their wavelengths are 
435.8 nm, 546.1 nm, and 700.0 nm, as documented in 
the CIE publication Colorimetry (cited on page 216). 
These primaries, ennumerated in Table 22.1, are histori- 
cally important; however, they are not useful for image 
coding or image reproduction. 



Table 22.1 CIE primaries 

were established for the CIE's 
color-matching experiments; 
they are unsuitable for image 
coding or reproduction. 





Red, 

700.0 nm 


Green, 
546.1 nm 


Blue, 

435.8 nm 


White 
CIE III. B 


X 


0.73469 


0.27368 


0.16654 


0.34842 


y 


0.26531 


0.71743 


0.00888 


0.35161 


Z 


0 


0.00890 


0.82458 


0.29997 



NTSC primaries (obsolete) 

In 1953, the NTSC standardized a set of primaries used 
in experimental color CRTs at that time. Those prima- 
ries and white reference are still documented in ITU-R 
Report 624. But phosphors changed over the years, 
primarily in response to market pressures for brighter 
receivers, and by the time of the first videotape 
recorder the primaries actually in use were quite 
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Figure 22.2 Primaries of video standards are plotted on the CIE 1931, 2° [x, y] chromaticity 
diagram. The colors that can be represented in positive RGB values lie within the triangle joining 
a set of primaries; here, the gray triangle encloses the Rec. 709 primaries. The Rec. 709 standard 
specifies no tolerance. SMPTE tolerances are specified as ±0.005 in x and y. EBU tolerances are 
shown as white quadrilaterals; they are specified in u', v' coordinates related to the color discrimi- 
nation of vision. The EBU tolerance boundaries are not parallel to the [x, y] axes. 
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different from those "on the books." So although you 
may see the NTSC primary chromaticities docu- 
mented - even in contemporary textbooks, and in stan- 
dards for image exchange! - they are of absolutely no 
practical use today. I include them in Table 22.2, so 
you'll know what primaries to avoid: 



Table 22.2 NTSC primaries 
(obsolete) were once used in 
480/ SDTV systems, but have 
been superseded by SMPTE 
RP 145 and Rec. 709 primaries. 





Red 


Green 


Blue 


White 
CIE III. C 


X 


0.67 


0.21 


0.14 


0.310 


y 


0.33 


0.71 


0.08 


0.316 


Z 


0 


0.08 


0.78 


0.374 



EBU Tech. 3213, EBU standard for 
chromaticity tolerances for studio 
monitors (Geneva: European 
Broadcasting Union, 1975; reis- 
sued 1981). 



The luma coefficients chosen for the NTSC system - 
0.299, 0.587, and 0.114 -were chosen in 1953, based 
upon these primaries. Decades later, in 1984, these 
luma coefficients were standardized in Rec. 601 
(described on page 291). Rec. 601 is silent concerning 
primary chromaticities. The primaries in use by 1984 
were quite different from the 1953 NTSC primaries. The 
luma coefficients in use for SDTV are no longer 
matched to the primary chromaticities. The discrepancy 
has little practical significance. 

EBU Tech. 3213 primaries 

Phosphor technology improved considerably in the 
decade following the adoption of the NTSC standard. In 
1966, the European Broadcasting Union (EBU) stan- 
dardized 576/ color video - then denoted 625/50, or 
just PAL. The primaries in Table 22.3 below are stan- 
dardized by EBU Tech. 3213. They are in use today for 
576/ systems, and they are very close to the Rec. 709 
primaries that I will describe in a moment: 



Table 22.3 EBU Tech. 3213 
primaries apply to 576/ 
SDTV systems. 





Red 


Green 


Blue 


White, D 65 


X 


0.640 


0.290 


0.150 


0.3127 


y 


0.330 


0.600 


0.060 


0.3290 


Z 


0.030 


0.110 


0.790 


0.3582 



The EBU retained, for PAL, the well-established NTSC 
luma coefficients. Again, the fact that the underlying 
primaries had changed has little practical significance. 
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SMPTE RP 145, SMPTE C Color 
Monitor Colorimetry. 

Table 22.4 SMPTE RP 145 
primaries apply to 480/ 
SDTV systems, and to early 
1035/30 HDTV systems. 



ITU-R Rec. BT.709, Basic parameter 
values for the HDTV standard for the 
studio and for international 
programme exchange. 



Table 22.5 Rec. 709 primaries 

apply to 1280x720 and 
1920x1080 HDTV systems; 
they are incorporated into the 
sRGB standard for desktop PCs. 



Rec. 601 does not specify primary 
chromaticities. It is implicit that 
SMPTE RP 145 primaries are used 
with 480/, and that EBU 3213 
primaries are used with 576/. 



SMPTE RP 145 primaries 

For 480/ SDTV, the primaries of SMPTE RP 145 are 
standard, as specified in Table 22.4: 





Red 


Green 


Blue 


White, Dg 5 


X 


0.630 


0.310 


0.155 


0.3127 


y 


0.340 


0.595 


0.070 


0.3290 


Z 


0.030 


0.095 


0.775 


0.3582 



RP 145 primaries are specified in SMPTE 240M for 
1035/30 HDTV, and were once included as the "interim 
implementation" provision of SMPTE standards for 
1280x720, and 1920x1080 HDTV. The most recent 
revisions of SMPTE standards for 1280x720 and 
1920x1080 have dropped provisions for the "interim 
implementation," and now specify only the Rec. 709 
primaries, which I will now describe. 

Rec. 709/sRGB primaries 

International agreement was obtained in 1990 on 
primaries for high-definition television (HDTV). The 
standard is formally denoted Recommendation ITU-R 
BT.709 (formerly CCIR Rec. 709). I'll call it Rec. 709. 
Implausible though this sounds, the Rec. 709 chroma- 
ticities are a political compromise obtained by choosing 
EBU red, EBU blue, and a green which is the average 
(rounded to 2 digits) of EBU green and SMPTE green! 
These primaries are closely representative of contempo- 
rary monitors in studio video, computing, and 
computer graphics. The Rec. 709 primaries and its D 65 
white point are specified in Table 22.5: 





Red 


Green 


Blue 


White, Dg 5 


X 


0.640 


0.300 


0.150 


0.3127 


y 


0.330 


0.600 


0.060 


0.3290 


Z 


0.030 


0.100 


0.790 


0.3582 



Video standards specify RGB chromaticities that are 
closely matched to practical monitors. Physical display 
devices involve tolerances and uncertainties, but if you 
have a monitor that conforms to Rec. 709 within some 
tolerance, you can think of the monitor as being device- 
independent. 
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I EC FDIS 61966-2-1, Multimedia 
systems and equipment - Colour 
measurement and management - 
Part 2-7; Colour management - 
Default ROB colour space - sRGB. 



The Rec. 709 primaries are incorporated into the sRGB 
specification used in desktop computing. Beware that 
the sRGB transfer function is somewhat different from 
the transfer functions standardized for studio video. 



The importance of Rec. 709 as an interchange standard 
in studio video, broadcast television, and HDTV, and 
the firm perceptual basis of the standard, assures that 
its parameters will be used even by such devices as flat- 
panel displays that do not have the same physics as 
CRTs. However, there is no doubt that emerging display 
technologies will soon offer a wider color gamut. 

CMFs and SPDs 

You might guess that you could implement a display 
whose primaries had spectral power distributions with 
the same shape as the CIE spectral analysis curves - the 
color-matching functions for XYZ. You could make such 
a display, but when driven by XYZ tristimulus values, it 
would not properly reproduce color. There are display 
primaries that reproduce color accurately when driven 
by XYZ tristimuli, but the SPDs of those primaries do 
not have the same shape as the Y(A), Y(A), and Z(A) 
CMFs. To see why requires understanding a very subtle 
and important point about color reproduction. 

To find a set of display primaries that reproduces color 
according to XYZ tristimulus values would require 
constructing three SPDs that, when analyzed by the 
Y(A), Y(A), and Z(A) color-matching functions, produced 
[1, 0, 0], [0, 1, 0], and [0, 0, 1], respectively. The X(A), 
Y(A), and Z(A) CMFs are positive across the entire spec- 
trum. Producing [0, 1, 0] would require positive contri- 
bution from some wavelengths in the required primary 
SPDs. We could arrange that; however, there is no 
wavelength that contributes to Y that does not also 
contribute positively to X or Z. 

The solution to this dilemma is to force the X and Z 
contributions to zero by making the corresponding 
SPDs have negative power at certain wavelengths. 
Although this is not a problem for mathematics, or even 
for signal processing, an SPD with a negative portion is 
not physically realizable in a transducer for light, 
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because light power cannot go negative. So we cannot 
build a real display that responds directly to XYZ. But as 
you will see, the concept of negative SPDs - and 
nonphysical SPDs or nonrealizable primaries - is very 
useful in theory and in practice. 



To understand the mathematical 
details of color transforms, 
described in this section, you should 
be familiar with linear (matrix) 
algebra. If you are unfamiliar with 
linear algebra, see Strang, Gilbert, 
Introduction to Linear Algebra, 
Second Edition (Boston: Welle- 
sley-Cambridge, 1998). 



There are many ways to choose nonphysical primary 
SPDs that correspond to the X(A), /(A), and Z(A) color- 
matching functions. One way is to arbitrarily choose 
three display primaries whose power is concentrated at 
three discrete wavelengths. Consider three display 
SPDs, each of which has some amount of power at 
600 nm, 550 nm, and 470 nm. Sample the X(A), /(A), 
and Z(A) functions of the matrix given earlier in Calcula- 
tion of tristimulus values by matrix multiplication, on 
page 21 8, at those three wavelengths. This yields the 
tristimulus values shown in Table 22.6: 



Table 22.6 Example primaries 

are used to explain the neces- 
sity of signal processing in 
accurate color reproduction. 





Red, 600 nm 


Green, 550 nm 


Blue, 470 nm 


X 


1.0622 


0.4334 


0.1954 


Y 


0.6310 


0.9950 


0.0910 


Z 


0.0008 


0.0087 


1.2876 



These coefficients can be expressed as a matrix, where 
the column vectors give the XYZ tristimulus values 
corresponding to pure red, green, and blue at the 
display, that is, [1, 0, 0], [0, 1, 0], and [0, 0, 1]. It is 
conventional to apply a scale factor in such a matrix to 
cause the middle row to sum to unity, since we wish to 
achieve only relative matches, not absolute: 



Eq 22.1 This matrix is based upon 
R, C, and B components with 
unusual spectral distributions. For 
typical R, G, and B, see Eq 22.8. 



x" 




0.618637 


0.252417 


0.113803" 




^600 nm 


Y 


= 


0.367501 


0.579499 


0.052999 


• 


C 550nm 


Z 




0.000466 


0.005067 


0.749913 




e 470nm 



That matrix gives the transformation from RCB to XYZ. 
We are interested in the inverse transform, from XYZ to 
RGB, so invert the matrix: 



^600 nm 




2.179151 


-0.946884 


-0.263777" 




x" 


C 550nm 


= 


-1.382685 


2.327499 


0.045336 


• 


Y 


S 470nm 




0.007989 


-0.015138 


1.333346 




Z 
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Michael Brill and R.W.G. Hunt 
argue that R, C, and B tristimulus 
values have no units. See Hunt, 
R.W.G. ,“The Heights of the CIE 
Colour-Matching Functions," in 
Color Research and Application, 

22 (5): 337 (Oct. 1997). 


The column vectors of the matrix in Equation 22.2 give, 
for each primary, the weights of each of the three 
discrete wavelengths that are required to display unit 
XYZ tristimulus values. The color-matching functions for 
CIE XYZ are shown in Figure 22.3, CMFsfor CIE XYZ 
primaries, on page 244. Opposite those functions, in 
Figure 22.4, is the corresponding set of primary SPDs. 
As expected, the display primaries have some negative 
spectral components: The primary SPDs are nonphys- 
ical. Any set of primaries that reproduces color from 
XYZ tristimulus values is necessarily supersaturated, 
more saturated than any realizable SPD could be. 

To determine a set of physical SPDs that will reproduce 
color when driven from XYZ, consider the problem in 
the other direction: Given a set of physically realizable 
display primaries, what CMFs are suitable to directly 
reproduce color using mixtures of these primaries? 

In this case the matrix that relates RGB components to 
CIE XYZ tristimulus values is all-positive, but the CMFs 
required for analysis of the scene have negative 
portions: The analysis filters are nonrealizable. 

Figure 22.6 shows a set of primary SPDs conformant to 
SMPTE 240M, similar to Rec. 709. Many different SPDs 
can produce an exact match to these chromaticities. 

The set shown is from a Sony Trinitron monitor. 

Figure 22.5 shows the corresponding color-matching 
functions. As expected, the CMFs have negative lobes 
and are therefore not directly realizable. 

We conclude that we can use physically realizable 
analysis CMFs, as in the first example, where XYZ 
components are displayed directly. But this requires 
nonphysical display primary SPDs. Or we can use phys- 
ical display primary SPDs, but this requires nonphysical 
analysis CMFs. As a consequence of the way color 
vision works, there is no set of nonnegative display 
primary SPDs that corresponds to an all-positive set of 
analysis functions. 

The escape from this conundrum is to impose a 3x3 
matrix multiplication in the processing of the camera 
signals, instead of using the camera signals to directly 
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A sensor element is a photosite. 



In a “one-chip" camera, hardware 
or firmware performs spatial inter- 
polation to reconstruct R, C, and 
B at each photosite. In a “three- 
chip" camera, the dichroic filters 
are mounted on one or two glass 
blocks. In optical engineering, a 
glass block is called a prism, but it 
is not the prism that separates the 
colors, it is the dichroic filters. 



CHAPTER 22 



drive the display. Consider these display primaries: 
monochromatic red at 600 nm, monochromatic green 
at 550 nm, and monochromatic blue at 470 nm. The 
3x3 matrix of Equation 22.2 can be used to process 
XYZ values into components suitable to drive that 
display. Such signal processing is not just desirable; it is 
a necessity for achieving accurate color reproduction ! 

Every color video camera or digital still camera needs to 
sense the image through three different spectral charac- 
teristics. Digital still cameras and consumer camcorders 
typically have a single area array CCD sensor ("one 
chip"); each 2x2 tile of the array has sensor elements 
covered by three different types of filter. Typically, filters 
appearing red, green, and blue are used; the green filter 
is duplicated onto two of the photosites in the 2x2 tile. 
This approach loses light, and therefore sensitivity. 

A studio video camera separates incoming light using 
dichroic filters operating as beam splitters; each compo- 
nent has a dedicated CCD sensor ("3 CCD"). Such an 
optical system separates different wavelength bands 
without absorbing any light, achieving high sensitivity. 

Figure 22.7 shows the set of spectral sensitivity func- 
tions implemented by the beam splitter and filter 
("prism") assembly of an actual video camera. The func- 
tions are positive everywhere across the spectrum, so 
the filters are physically realizable. However, rather poor 
color reproduction will result if these signals are used 
directly to drive a display having Rec. 709 primaries. 
Figure 22.8 shows the same set of camera analysis func- 
tions processed through a 3x3 matrix transform. The 
transformed components will reproduce color more 
accurately - the more closely these curves resemble the 
ideal Rec. 709 CMFs of Figure 22.5, the more accurate 
the camera's color reproduction will be. 

In theory, and in practice, using a linear matrix to 
process the camera signals can capture and reproduce 
all colors correctly. However, capturing all of the colors 
is seldom necessary in practice, as I will explain in the 
Gamut section below. Also, capturing the entire range 
of colors would incur a noise penalty, as I will describe 
in Noise due to matrixing, on page 252. 
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Figure 22.3 CMFs for CIE XYZ primaries. To acquire all colors in a scene requires filters having 
the CIE X(A), VTA), and Z(A) spectral sensitivities. The functions are nonnegative, and therefore 
could be realized in practice. However, these functions are seldom used in actual cameras or scan- 
ners, for various engineering reasons. 
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Figure 22.4 SPDs for CIE XYZ primaries. To directly reproduce a scene that has been analyzed 
using the CIE color-matching functions requires nonphysical primaries having negative excursions, 
which cannot be realized in practice. Many different sets are possible. In this hypothetical 
example, the power in each primary is concentrated at the same three discrete wavelengths, 470, 
550, and 600 nm. 
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Figure 22.5 CMFs for Rec. 709 primaries. These analysis functions are theoretically correct to 
acquire RGB components for display using Rec. 709 primaries. The functions are not directly real- 
izable in a camera or a scanner, due to their negative lobes. But they can be realized by a 3x3 
matrix transformation of the CIE XYZ color-matching functions of Figure 22.3. 
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Figure 22.6 SPDs for Rec. 709 primaries. This set of SPDs has chromaticity coordinates that 
conform to SMPTE RP 145, similar to Rec. 709. Many SPDs could produce the same chromaticity 
coordinates; this particular set is produced by a Sony Trinitron monitor. The red primary uses rare 
earth phosphors that produce very narrow spectral distributions, different from the phosphors 
used for green or blue. 
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Spectral sensitivity of Blue sensor Spectral sensitivity of Green sensor Spectral sensitivity of Red sensor 






Figure 22.7 Analysis functions for a real camera. This set of spectral sensitivity functions is 
produced by the dichroic color separation filters (prism) of a state-of-the-art CCD studio camera. 
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Figure 22.8 CMFs of an actual camera after matrixing for Rec. 709 primaries. These curves 
result from the analysis functions of Figure 22.7, opposite, being processed through a 3x3 
matrix. Colors as "seen" by this camera will be accurate to the extent that these curves match 
the ideal CMFs for Rec. 709 primaries shown in Figure 22.5. 
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Luminance coefficients 



For the D 65 reference 
now standard in video, C 
is multiplied by the vector 
[0.95, 1, 1.089], 



See Rec. 601 luma, 

SMPTE 240M-1 988 luma, 
and Rec. 709 luma, on pages 
291 and following. 



Relative luminance can be formed as a properly 
weighted sum of RGB tristimulus components. The 
luminance coefficients can be computed starting with 
the chromaticities of the RGB primaries, here expressed 
in a matrix: 



c = 



* r * g 

y r 

7, 






x b 

Tb 



Eq 22.3 



Coefficients 7 r , 7 g , and 7 ^ are computed from the chro- 
maticities, and the white reference, as follows: 



"V 


-1 


"*w~ 




= c • 


7w 


4 




z w_ 



1 

Tw 



Eq 22.4 



Luminance can then be computed as follows: 




J g y s 




R 

G 

B 



Eq 22.5 



This calculation can be extended to compute [X, Y, Z] 
from [/?, G, B ] of the specified chromaticity. First, 
compute a matrix T, which depends upon the primaries 
and the white point of the [/?, G, B ] space: 



T = C» 



4 

0 

0 



0 

0 



0 

0 



Eq 22.6 



The elements 7 r , 7 g , and 7^ of the diagonal matrix have 
the effect of scaling the corresponding rows of the 
chromaticity matrix, balancing the primary contribu- 
tions to achieve the intended chromaticity of white. CIE 
tristimulus values [X, Y, Z] are then computed from the 
specified [/?, G, B] as follows: 



X 




"/?" 


Y 


= T» 


C 


Z 




B 



As I explained in Constant luminance, on page 75, video 
systems compute luma as a weighted sum of nonlinear 
R'G'B’ components. Even with the resulting noncon- 
stant-luminance errors, there is a second-order benefit 
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SMPTE RP 177, Derivation of Basic 
Television Color Equations. 



Eq 22.8 



When constructing such a matrix for 
fixed-point calculation, take care 
when rounding to preserve unity 
sum of the middle (luminance) row. 



Eq 22.9 



I'll describe gamut on page 255. 



in using the "theoretical" coefficients. The standard 
coefficients are computed as above, from the 1953 FCC 
NTSC primaries and CIE llluminant C (for SDTV and 
computer graphics), from SMPTE RP 145 primaries and 
CIE D 65 (for 1035/ HDTV), and from Rec. 709 prima- 
ries and CIE D 65 (for other HDTV standards). 

Transformations between RGB and CIE XYZ 

RGB values in a particular set of primaries can be trans- 
formed to and from CIE XYZ by a 3x3 matrix trans- 
form. These transforms involve tristimulus values, that 
is, sets of three linear-light components that approxi- 
mate the CIE color-matching functions. CIE XYZ repre- 
sents a special case of tristimulus values. In XYZ, any 
color is represented by an all-positive set of values. 
SMPTE has standardized a procedure for computing 
these transformations. 

To transform from Rec. 709 RGB (with its D 65 white 
point) into CIE XYZ, use the following transform: 



X 




0.412453 0.357580 0.180423" 




R V09 


Y 


= 


0.212671 0.715160 0.072169 


• 


C 709 


Z 




0.019334 0.119193 0.950227 




®709 



The middle row of this matrix gives the luminance coef- 
ficients of Rec. 709. Because white is normalized to 
unity, the middle row sums to unity. The column 
vectors are the XYZ tristimulus values of pure red, 
green, and blue. To recover primary chromaticities from 
such a matrix, compute little x and y for each RGB 
column vector. To recover the white point, transform 
RGB = [1, 1, 1] to XYZ, then compute x and y according 
to Equation 21.1 . 

To transform from CIE XYZ into Rec. 709 RGB, use the 
inverse of Equation 22.8: 



R 709 




3.240479 


-1.537150 


-0.498535" 




x" 


C 709 


= 


-0.969256 


1.875992 


0.041556 


• 


Y 


®709 




0.055648 


-0.204043 


1.057311 




Z 



This matrix has some negative coefficients: XYZ colors 
that are out of gamut for Rec. 709 RGB transform to 
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RGB components where one or more components are 
negative or greater than unity. 

Any RGB image data, or any matrix that purports to 
relate RGB to XYZ, should indicate the chromaticities of 
the RGB primaries involved. If you encounter a matrix 
transform or image data without reference to any 
primary chromaticities, be very suspicious! Its origi- 
nator may be unaware that RGB values must be associ- 
ated with chromaticity specifications in order to have 
meaning for accurate color. 

Noise due to matrixing 

Even if it were possible to display colors in the outer 
reaches of the chromaticity diagram, there would be 
a great practical disadvantage in doing so. Consider 
a camera that acquires XYZ tristimulus components, 
then transforms to Rec. 709 RGB according to 
Equation 22.9. The coefficient 3.240479 in the upper 
left-hand corner of the matrix in that equation deter- 
mines the contribution from X at the camera into the 
red signal. An X component acquired with 1 mV of 
noise will inject 3.24 mV of noise into red: There is 
a noise penalty associated with the larger coefficients in 
the transform, and this penalty is quite significant in the 
design of a high-quality camera. 

Transforms among RGB systems 

RGB values in a system employing one set of primaries 
can be transformed to another set by a 3x3 linear-light 
matrix transform. 

[/?, G, B] tristimulus values in a source space (denoted 
with the subscripts) can be transformed into [/?, G, B] 
tristimulus values in a destination space (denoted with 
the subscript d), using matrices 7" s and T ^ computed 
from the corresponding chromaticities and white 
points: 



Rd 


II 

• 

• 


Rs 


c d 


C s 


Bd 




B s 
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As an example, here is the transform from SMPTE 
RP 145 RGB (e.g., SMPTE 240M) to Rec. 709 RGB: 





r 709 




0.939555 0.050173 0.010272 




/? 145 


Eq 22.11 


C 709 

e 709 


— 


0.017775 0.965795 0.016430 

-0.001622 -0.004371 1.005993 


• 


c 145 

B 145 



This matrix transforms EBU 3213 RGB to Rec. 709: 



R 709 




"1.044036 


-0.044036 


0 




r eb\j 


C 709 


= 


0 


1 


0 


• 


C EBU 


e 709 




0 


0.011797 


0.988203 




®EBU 



To transform typical Sony Trinitron RGB, with D 65 white 
reference, to Rec. 709, use this transform: 





r 709 




"1.068706 


-0.078595 


0.009890" 




Rsony 


Eq 22.13 


C 709 


= 


0.024110 


0.960070 


0.015819 


• 


C SONY 




e 709 




0.001735 


0.029748 


0.968517 




e SONY 



Transforming among RGB systems may lead to an out of 
gamut RGB result, where one or more RGB compo- 
nents are negative or greater than unity. 

These transformations produce accurate results only 
when applied to tristimulus (linear-light) components. 

In principle, to transform nonlinear R'G'B' from one 
primary system to another requires application of the 
inverse transfer function to recover the tristimulus 
values, computation of the matrix multiplication, then 
reapplication of the transfer function. However, the 
transformation matrices of Equations 22.1 1 , 22.12, and 
22.13 are similar to the identity matrix: The diagonal 
terms are nearly unity, and the off-diagonal terms are 
nearly zero. In these cases, if the transform is computed 
in the nonlinear (gamma-corrected) R'G’B’ domain, the 
resulting errors will be small. 

Camera white reference 

There is an implicit assumption in television that the 
camera operates as if the scene were illuminated by 
a source having the chromaticity of CIE D 65 . In prac- 
tice, television studios are often lit by tungsten lamps, 
and scene illumination is often deficient in the short- 
wave (blue) region of the spectrum. This situation is 
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SMPTE RP 71, Setting Chromaticity 
and Luminance of White for Color 
Television Monitors Using Shadow- 
Mask Picture Tubes. 



compensated by white balancing - that is, by adjusting 
the gain of the red, green, and blue components at the 
camera so that a diffuse white object reports the values 
that would be reported if the scene illumination had 
the same tristimulus values as CIE D 65 . In studio 
cameras, controls for white balance are available. In 
consumer cameras, activating white balance causes the 
camera to integrate red, green, and blue over the 
picture, and to adjust the gains so as to equalize the 
sums. (This approach to white balancing is sometimes 
called integrate to gray.) 

Monitor white reference 

In additive mixture, the illumination of the reproduced 
image is generated entirely by the display device. In 
particular, reproduced white is determined by the char- 
acteristics of the display, and is not dependent on the 
environment in which the display is viewed. In a 
completely dark viewing environment, such as a cinema 
theater, this is desirable; a wide range of chromaticities 
is accepted as "white" However, in an environment 
where the viewer's field of view encompasses objects 
other than the display, the viewer's notion of "white" is 
likely to be influenced or even dominated by what he 
or she perceives as "white" in the ambient. To avoid 
subjective mismatches, the chromaticity of white repro- 
duced by the display and the chromaticity of white in 
the ambient should be reasonably close. SMPTE has 
standardized the chromaticity of reference white in 
studio monitors; in addition, the standard specifies that 
luminance for reference white be reproduced at 
103 cd-rrr 2 . 

Modern blue CRT phosphors are more efficient with 
respect to human vision than red or green phosphors. 
Until recently, brightness was valued in computer moni- 
tors more than color accuracy. In a quest for a small 
brightness increment at the expense of a loss of color 
accuracy, computer monitor manufacturers adopted 
a white point having a color temperature of about 
9300 K, producing a white having about 1.3 times as 
much blue as the standard CIE white reference 
used in television. So, computer monitors and 
computer pictures often look excessively blue. The 
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143-155 (Fall 1980). 



situation can be corrected by adjusting or calibrating 
the monitor to a white reference with a lower color 
temperature. (Studio video standards in Japan call for 
viewing with a 9300 l< white reference; this is appar- 
ently due to a cultural preference regarding the repro- 
duction of skin tones.) 

Gamut 

Analyzing a scene with the CIE analysis functions 
produces distinct component triples for all colors. But 
when transformed into components suitable for a set of 
physical display primaries, some of those colors -those 
colors whose chromaticity coordinates lie outside the 
triangle formed by the primaries - will have negative 
component values. In addition, colors outside the 
triangle of the primaries may have one or two primary 
components that exceed unity. These colors cannot be 
correctly displayed. Display devices typically clip signals 
that have negative values and saturate signals whose 
values exceed unity. Visualized on the chromaticity 
diagram, a color outside the triangle of the primaries is 
reproduced at a point on the boundary of the triangle. 

If a scanner is designed to capture all colors, its 
complexity is necessarily higher and its performance is 
necessarily worse than a camera designed to capture 
a smaller range of colors. Thankfully, the range of colors 
encountered in the natural and man-made world is 
a small fraction of all of the colors. Although it is neces- 
sary for an instrument such as a colorimeter to measure 
all colors, in an imaging system we are generally 
concerned with colors that occur frequently. 

M.R. Pointer characterized the distribution of 
frequently occurring real surface colors. The naturally 
occurring colors tend to lie in the central portion of the 
chromaticity diagram, where they can be encompassed 
by a well-chosen set of physical primaries. An imaging 
system performs well if it can display all or most of 
these colors. Rec. 709 does reasonably well; however, 
many of the colors of conventional offset printing - 
particularly in the cyan region - are not encompassed 
by all-positive Rec. 709 RGB. To accommodate such 
colors requires wide-gamut reproduction. 
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Wide-gamut reproduction 

For much of the history of color television, cameras 
were designed to incorporate assumptions about the 
color reproduction capabilities of color CRTs. But nowa- 
days, video production equipment is being used to 
originate images for a much wider range of applications 
than just television broadcast. The desire to make 
digital cameras suitable for originating images for this 
wider range of applications has led to proposals for 
video standards that accommodate a wider gamut. 

I will introduce the Rec. 1361 transfer function, on 
page 265. That transfer function is intended to be the 
basis for wide-gamut reproduction in future HDTV 
systems. The Rec. 1361 function is intended for use 
with RGB tristimulus values having Rec. 709 primaries. 
However, the RGB values can occupy a range from 
-0.25 to +1.33, well outside the range 0 to 1. The 
excursions below zero and above unity allow Rec. 1361 
RGB values to represent colors outside the triangle 
enclosed by the Rec. 709 primaries. When the 
extended R'G'B' values are matrixed, the resulting 
Y'C b C r values lie within the "valid" range: Regions of 
Y'C b C r space outside the "legal" RGB cube are 
exploited to convey wide-gamut colors. For details, see 
C B C R components for Rec. 1361 HDTV, on page 318. 

Further reading 

For a highly readable short introduction to color image 
coding, consult DeMarsh and Giorgianni. For a terse, 
complete technical treatment, read Schreiber (cited in 
the margin of page 20). 

For a discussion of nonlinear RGB in computer graphics, 
read Lindbloom's SIGGRAPH paper. 

In a computer graphics system, once light is on its way 
to the eye, any tristimulus-based system can accurately 
represent color. However, the interaction of light and 
objects involves spectra, not tristimulus values. In 
computer-generated imagery (CGI), the calculations 
actually involve sampled SPDs, even if only three 
components are used. Roy Hall discusses these issues. 
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23 



Luminance is proportional to inten- 
sity. For an introduction to the 
terms brightness, intensity, lumi- 
nance, and lightness, see page 1 1 . 



In photography, video, and computer graphics, the 
gamma symbol, y, represents a numerical parameter 
that describes the nonlinearity of luminance reproduc- 
tion. Gamma is a mysterious and confusing subject, 
because it involves concepts from four disciplines: 
physics, perception, photography, and video. This 
chapter explains how gamma is related to each of these 
disciplines. Having a good understanding of the theory 
and practice of gamma will enable you to get good 
results when you create, process, and display pictures. 



This chapter focuses on electronic reproduction of 
images, using video and computer graphics techniques 
and equipment. I deal mainly with the reproduction of 
luminance, or, as a photographer would say, tone scale. 
Achieving good tone reproduction is one important 
step toward achieving good color reproduction. (Other 
issues specific to color reproduction were presented in 
the previous chapter, Color science for video.) 



Electro-optical transfer function 
(EOTF) refers to the transfer func- 
tion of the device that converts 
from the electrical domain of 
video into light - a display. 



A cathode-ray tube (CRT) is inherently nonlinear: The 
luminance produced at the screen of a CRT is 
a nonlinear function of its voltage input. From a strictly 
physical point of view, gamma correction in video and 
computer graphics can be thought of as the process of 
compensating for this nonlinearity in order to achieve 
correct reproduction of relative luminance. 



As introduced in Nonlinear image coding, on page 12, 
and detailed in Luminance and lightness, on page 203, 
the human perceptual response to luminance is quite 
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Opto-electronic transfer function 
(OETF) refers to the transfer func- 
tion of a scanner or camera. 
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nonuniform: The lightness sensation of vision is roughly 
the 0.4-power function of luminance. This character- 
istic needs to be considered if an image is to be coded 
to minimize the visibility of noise, and to make effec- 
tive perceptual use of a limited number of bits per 
pixel. 

Combining these two concepts - one from physics, the 
other from perception - reveals an amazing coinci- 
dence: The nonlinearity of a CRT is remarkably similar 
to the inverse of the lightness sensitivity of human 
vision. Coding luminance into a gamma-corrected signal 
makes maximum perceptual use of the channel. If 
gamma correction were not already necessary for phys- 
ical reasons at the CRT, we would have to invent it for 
perceptual reasons. 

I will describe how video draws aspects of its handling 
of gamma from all of these areas: knowledge of the CRT 
from physics, knowledge of the nonuniformity of vision 
from perception, and knowledge of viewing conditions 
from photography. I will also discuss additional details 
of the CRT transfer function that you will need to know 
if you wish to calibrate a CRT or determine its nonlin- 
earity. 

Gamma in CRT physics 

The physics of the electron gun of a CRT imposes 
a relationship between voltage input and light output 
that a physicist calls a five-halves power law: The lumi- 
nance of light produced at the face of the screen is 
proportional to voltage input raised to 5 /2 power. Lumi- 
nance is roughly between the square and cube of the 
voltage. The numerical value of the exponent of this 
power function is represented by the Greek letter y 
(gamma). CRT monitors have voltage inputs that reflect 
this power function. In practice, most CRTs have 
a numerical value of gamma quite close to 2.5. 

Figure 23.1 opposite is a sketch of the power function 
that applies to the electron gun of a grayscale CRT, or 
to each of the red, green, and blue electron guns of 
a color CRT. The three guns of a color CRT exhibit very 
similar, but not necessarily identical, responses. 
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Figure 23.1 CRT transfer function involves a nonlinear relationship between video signal and lumi- 
nance, graphed here for an actual CRT at three different settings of the contrast control. Luminance 
is approximately proportional to input signal voltage raised to the 2.5 power. Th e gamma of a display 
system - or more specifically, a CRT - is the numerical value of the exponent of the power function. 
Here I show the contrast control varying luminance, on they-axis; however, owing to the mathe- 
matical properties of a power function, scaling the voltage input would yield the identical effect. 



The nonlinearity in the voltage-to-luminance function 
of a CRT originates with the electrostatic interaction 
between the cathode, the grid, and the electron beam. 
The function is influenced to a small extent by the 
mechanical structure of the electron gun. Contrary to 
popular opinion, the CRT phosphors themselves are 
quite linear, at least up to the onset of saturation at 
a luminance of about eight-tenths of maximum. 

I denote the exponent the decoding gamma, y D . 



Gamma correction involves 
a power function, which has the 
form y = x a (where a is constant). 
It is sometimes incorrectly 
claimed to be an exponential 
function, which has the form 
y = a x (where a is constant). 

Gamma correction is unrelated 
to the gamma function T(x) of 
mathematics. 



In a video camera, we precompensate for the CRT's 
nonlinearity by processing each of the R, G, and B tris- 
timulus signals through a nonlinear transfer function. 
This process is known as gamma correction. The func- 
tion required is approximately a square root. The curve 
is often not precisely a power function; nonetheless, 

I denote the best-fit exponent the encoding gamma, y E . 
In video, gamma correction is accomplished by analog 
(or sometimes digital) circuits at the camera. In 
computer graphics, gamma correction is usually accom- 
plished by incorporating the nonlinear transfer function 
into a framebuffer's lookup table. 
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The value of decoding gamma (y D ) for a typical, prop- 
erly adjusted CRT ranges from about 2.35 to 2.55. 
Computer graphics practitioners sometimes claim 
numerical values of gamma wildly different from 2.5; 
however, such measurements often disregard two 
issues. First, the largest source of variation in the 
nonlinearity of a monitor is careless setting of the 
brightness (or black level) control. Before a sensible 
measurement of gamma can be made, this control must 
be adjusted, as outlined on page 26, so that black 
elements in the picture are correctly reproduced. 
Second, computer systems often have lookup tables 
(LUTs) that effect control over transfer functions. 

A gamma value dramatically different from 2.5 is often 
due to the function loaded into the LUT. A Macintosh is 
often said to have a gamma of 1.8; however, this value 
is a consequence of the default Macintosh LUT! The 
Macintosh monitor itself has gamma between about 
2.35 and 2.55. 



Getting the physics right is an important first step 
toward proper treatment of gamma, but it isn't the 
whole story, as you will see. 

The amazing coincidence! 

In Luminance and lightness, on page 203, I described 
the nonlinear relationship between luminance 
(a physical quantity) and lightness (a perceptual quan- 
tity): Lightness is approximately luminance raised to the 
0.4-power. The previous section described how the 
nonlinear transfer function of a CRT relates a voltage 
signal to luminance. Here's the surprising coincidence: 
The CRT voltage-to-luminance function is very nearly 
the inverse of the luminance-to-lightness relationship. 

In analog systems, we represent lightness information 
as a voltage, to be transformed into luminance by 
a CRT's power function. In a digital system, we simply 
digitize analog voltage. To minimize the perceptibility 
of noise, we should use a perceptually uniform code. 
Amazingly, the CRT function is a near-perfect inverse of 
vision's lightness sensitivity: CRT voltage is effectively 
a perceptually uniform code! 
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Gamma in video 



Many video engineers are 
unfamiliar with color science. They 
consider only the first of these two 
purposes, and disregard, or remain 
ignorant of, the great importance 
of perceptually uniform coding. 



In a video system, gamma correction is applied at the 
camera for the dual purposes of precompensating the 
nonlinearity of the display's CRT and coding into 
perceptually uniform space. Figure 23.2 summarizes the 
image reproduction situation for video. At the left, 
gamma correction is imposed at the camera; at the 
right, the display imposes the inverse function. 



Coding into a perceptual domain was important in the 
early days of television because of the need to mini- 
mize the noise introduced by over-the-air analog trans- 
mission. However, the same considerations of noise 
visibility apply to analog videotape recording. These 
considerations also apply to the quantization error that 
is introduced upon digitization, when a signal repre- 
senting luminance is quantized to a limited number of 
bits. Consequently, it is universal to convey video 
signals in gamma-corrected form. 



ae c ° c cin£' 







Figure 23.2 Image reproduction in video. 

Luminance from the scene is reproduced at the 
display, with a suitable scale factor because we do not seek 
to reproduce the absolute luminance level of the scene. However, 
the ability of vision to detect that two luminance levels differ is not 
uniform from black to white, but is approximately a constant ratio - about 1% - 
of the luminance. In video, luminance from the scene is transformed by a function similar 
to a square root into a nonlinear, perceptually uniform signal that is transmitted. The 
camera is designed to mimic the human visual system, in order to "see" lightness in the 
scene the same way that a human observer would. Noise introduced by the transmission 
system then has minimum perceptual impact. The nonlinear signal is transformed back to 
luminance at the display. In a CRT, a 2.5-power function is intrinsic. 



CHAPTER 23 



GAMMA 



261 





The importance of rendering intent, 
and the consequent requirement 
for different exponents for 
encoding (y D ) and decoding (y D ), 
has been poorly recognized and 
poorly documented in the develop- 
ment of video. 



Eq 23.1 

y E “ 0.5; y D = 2.5; 
Y e • Y d = 1 -25 



As I explained in Rendering intent, on page 81, it is 
important for perceptual reasons to alter the tone scale 
of an image reproduced at a luminance substantially 
lower than that of the original scene, reproduced with 
limited contrast ratio, or viewed in a dim surround. The 
dim surround condition is characteristic of television 
viewing. In video, the alteration is accomplished at the 
camera by slightly undercompensating the actual power 
function of the CRT, to obtain an end-to-end power 
function whose exponent is about 1.25, as indicated in 
Equation 23.1 in the margin. This achieves end-to-end 
reproduction that is subjectively correct (though not 
mathematically linear). 



Optoelectronic transfer functions (OETFs) 

Unfortunately, several different transfer functions have 
been standardized and are in use. In the sections to 
follow, I will detail these transfer function standards: 



• Rec. 709 is an international standard that specifies the 
basic parameters of HDTV. Although the Rec. 709 
transfer function is intended for HDTV, it is representa- 
tive of current SDTV technology, and it is being retro- 
fitted into SDTV studio standards. 



• SMPTE 240M was the first studio HDTV standard; its 
transfer function remains in use in some HDTV equip- 
ment deployed today. Recent revisions of SMPTE stan- 
dards call for the Rec. 709 transfer function, but 
previous revisions allowed an "interim implementa- 
tion" using the transfer function of SMPTE 240M-1988. 

• Rec. 1361 extends the Rec. 709 coding to accommo- 
date a wide color gamut; it is not yet deployed. 

• sRGB refers to a transfer function used in PCs. 



• The transfer function of the original 1953 NTSC specifi- 
cation, often written V2.2, has been effectively super- 
seded by Rec. 709. 

• The transfer function of European specifications for 
5 76/, often written V2.8, has been effectively super- 
seded by Rec. 709. 



262 



DIGITAL VIDEO AND HDTV ALGORITHMS AND INTERFACES 



Figure 23.3 Rec. 709 
transfer function is 

used in SDTVand HDTV. 




ITU-R Rec. BT.709, Basic param- 
eter values for the HDTV standard 
for the studio and for international 
programme exchange. 



Rec. 709 transfer function 

Figure 23.3 illustrates the transfer function defined by 
the international Rec. 709 standard for high-definition 
television (HDTV). It is based upon a pure power func- 
tion with an exponent of 0.45. Theoretically, a pure 
power function suffices for gamma correction; however, 
the slope of a pure power function (whose exponent is 
less than unity) is infinite at zero. In a practical system 
such as a television camera, in order to minimize noise 
in dark regions of the picture it is necessary to limit the 
slope (gain) of the function near black. Rec. 709 speci- 
fies a slope of 4.5 below a tristimulus value of +0.01 8, 
and scales and offsets the pure power function segment 
of the curve to maintain function and tangent conti- 
nuity at the breakpoint. 



The symbol L suggests linear. Take 
care not to confuse it with light- 
ness, L*. The symbol V' suggests 
voltage, or video. 



In this equation the tristimulus (linear light) compo- 
nent is denoted L, and the resulting gamma-corrected 
video signal - one of R', G', or B' components - is 
denoted with a prime symbol, V' 709 . R, G, and B are 
processed through identical functions to obtain R', G', 
and B'\ 



4.5 L] 0 < L < 0.018 

^709 = 1 0 45 Eq 23.2 

|l . 099 r -0.099; 0.018 < E < 1 

The Rec. 709 equation includes an exponent of 0.45. 
However, the effect of the scale factor and offset terms 
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makes the overall power function very similar to 
a square root (y E =;0.5). For this reason, it is misleading 
to describe Rec. 709 as having "gamma of 0.45." 



See Headroom and footroom, 
on page 22. 



Rec. 709 encoding assumes that encoded R'G'B' signals 
will be converted to tristimulus values at a CRT (or 
some other display device) with a 2.5-power function 
(y D ~2 -5): 

L = [V') Eq 23.3 

The product of the effective 0.5 exponent at the camera 
and the 2.5 exponent at the display produces an end- 
to-end power of about 1.25, suitable for typical televi- 
sion display environment, as I explained in Rendering 
intent, on page 81 . Should you wish to recover the RGB 
scene tristimulus values, invert Equation 23.2: 



709 . 



L = 



4.5 

1 

I/709 +0.099 to. 45 
1.099 



0 <V 709 < 0.081 



0.081 < \7' 7Q 9< 1 



Eq 23.4 



Equation Eq 23.4 does not incorporate correction for 
rendering intent: The recovered values are proportional 
to the scene tristimulus values, not to the intended 
display tristimulus values. Rec. 709 is misleading in its 
failure to discuss - or even mention - rendering intent. 



I have described signals in the abstract range 0 to 1. 
When R'G'B’ or Y' components are interfaced in 8 bits, 
the 0 to 1 values are scaled by 219 and offset by +16. 
Interface codes below 16 and above 235 are used for 
footroom and headroom. (Codes 0 and 255 are used for 
synchronization, and are otherwise prohibited.) 

For interfaces having more than 8 bits, the reference 
black and white levels are multiplied by 2 k ~ 8 , where k 
is the number of bits at the interface. For example, 
when R'G’B' or Y' components are interfaced in 10 bits, 
the 0 to 1 values are scaled by 219-4 (i.e., 876), and 
offset by +64. (At the interface, codes having the 
8 most-significant bits all zero, or all one, are prohib- 
ited from video data.) 
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SMPTE 240M, 1125-Line High- 
Definition Production Systems - 
Signal Parameters. 



SMPTE 240M transfer function 

SMPTE Standard 240 M for 1125/60, 1035/30 HDTV 
was adopted two years before Rec. 709 was adopted. 
Virtually all HDTV equipment deployed between 1988 
and 1998 uses the SMPTE 240M parameters. SMPTE 
and ATSC standards for HDTV now specify Rec. 709 
parameters; however, the standards previously accom- 
modated an "interim implementation" having the iden- 
tical transfer function to SMPTE 240M. SMPTE 240M's 
transfer function is this: 



240 = 



4.06; 



0< L <0.0228 



o 4 S Eq 23.5 

1.11156°- -0 .1115; 0.0228 < Z. < 1 



To recover scene tristimulus values, use this relation: 



In 1993, CCIR was renamed 
ITU-R. 



ITU-R Rec. BT.1361, Worldwide 
unified colorimetry and related 
characteristics of future television 
and imaging systems. 



0<V' 2 4o<0.09'\3 

Eq 23.6 

0.0913 <V 240 <1 

The difference between the SMPTE 240M and Rec. 709 
transfer functions is negligible for real images. It is 
a shame that international agreement could not have 
been reached on the SMPTE 240M parameters that 
were widely implemented in 1990, when the CCIR 
discussions were taking place. The transfer function of 
Rec. 709 is closely representative of current studio 
practice, and should be used for all but very unusual 
conditions. 

Rec. 1361 transfer function 

Rec. 1361 is intended to enable future HDTV systems 
to achieve wider color gamut than Rec. 709, through 
use of tristimulus signals having negative values and 
values greater than unity. The Rec. 1361 transfer func- 
tion is identical to Rec. 709's for RGB tristimulus values 
within Rec. 709's range, that is, between 0 and 1. Tris- 
timulus values from -V4 to zero are subject to a transfer 
function that is Rec. 709's function mirrored, and scaled 
by a factor of V4, on both axes. Tristimulus values from 
unity to + 4 /s are subject to a straightforward extension 
of the Rec. 709 curve. 



L = 



240 . 



4.0 



240 



+0.1115 



1.1115 



0.45 
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Figure 23.4 Rec. 1361 
transfer function 



Eq 23.7 



160 _ 2 5 -5 
219 ~ 73-3 



Encoding for Rec. 1361 is expressed in Equation 23.7: 



^ 1 361 “ 



/ \ 0.45 

1.099(-4 Z.) -0.099 



4.5L; 



1 .099 1 0 - 45 -0.099; 



-0.25 <L< -0.004 
-0.0045 <L< 0.018 

0.018 <L< 1.33 



The function is graphed in Figure 23.4 above. 



For positive values of \/' 1 361 , it is assumed that 
a conventional display will apply a 2.5-power function 
to produce display tristimulus values. A wide-gamut 
display is expected to do whatever signal processing is 
necessary to deliver the colors within its gamut. 



The gamma-corrected R'C'B' components of Rec. 1361 
lie in the range [-0.25 ... +1.152]. Their black-to-white 
excursion is reduced by the ratio 16 %i 9 from that of 
Rec. 709, SMPTE 274M, or SMPTE 296M R'G’B' 
components: Scaled by 160 for 8-bit coding, they 
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IEC FDIS 61966-2-1, Multimedia 
systems and equipment - Colour 
measurement and management - 
Part 2-1: Colour management - 
Default ROB colour space - sRGB. 



would occupy the range [-40 ... 184.3]. If subse- 
quently offset +48 at an 8-bit interface, they would lie 
in the range [8 ... 232.3]. However, Rec. 1361 is 
intended for use with 10-bit components, at minimum. 
As in Rec. 709, the reference black and white levels are 
multiplied by 2 k ~ 8 , where k is the number of bits at the 
interface. When scaled and offset for 10-bit 4:4:4 inter- 
face, Rec. 1361 R'G'B' ranges 192 through 832 
(compared to a range of 64 through 940 for 10-bit 
Rec. 709). 

sRGB transfer function 

The notation sRGB refers to a specification for color 
image coding for personal computers, and for image 
exchange on the Internet. The FlashPix file format for 
digital still cameras incorporates sRGB coding (there 
called NIFRGB). The sRGB specification calls for 
a transfer function very similar to - but regrettably not 
identical to - Rec. 709. The encoding is this: 

[12.92/.; 0<L< 0.0031 308 

^sRGB = ( 1 ') Eq 23.8 

[l .055/. 2,4 - 0.0551 0.0031308 < L < 1 

Although the equation contains the exponent V2. 4, the 
the scale factor and the offset cause the overall func- 
tion to approximate a pure 0.45-power function 
(y E =:0.45). It is misleading to describe sRGB as having 
“gamma of 0.42." 



Stokes, Michael, and Matthew 
Anderson, Srinivasan Chandrasekar, 
and Ricardo Motta, A Standard 
Default Color Space for the Internet - 
sRGB. Internet: www.color.org. 

See Rendering intent, on page 81 . 



y E a* 0.45 



1 

2.22 



Y d ~2.5 



sRGB encoding assumes that conversion of the encoded 
R'G'B’ signals will be accomplished at a CRT with 
a nominal 2.5-power function, as in Rec. 709 and 
SMPTE 240 M coding. However, the sRGB specification 
anticipates a higher ambient light level for viewing than 
Rec. 709 and SMPTE 240M: sRGB's effective 
0.45-power function, displayed on a monitor with 
a 2.5-power, results in an end-to-end power of 1.125. 
This is considerably lower than the 1.25 value produced 
by Rec. 709 encoding. 



It is standard to code sRGB components in 8-bit form 
from 0 to 255, with no footroom and no headroom. 
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1.0 — i 



Figure 23.5 Rec. 709, 
sRGB, and CIE L* 
transfer functions 

are compared. They 
are all approximately 
perceptually uniform; 
however, they are not 
close enough to be 
interchangeable. 




Tristimulus value, L (relative) 



Use this relation to recover scene tristimulus values - 
but not display tristimulus values! - from sRGB: 



L 



-V\ 



sRGB' 



12.92 
'/'sRGB +0 -055 
1.055 



2.4 



0<l/' sRGB < 0.03928 
0.03928 <\/' sRGB <1 



Eq 23.9 



Figure 23.5 sketches the sRGB transfer function, along- 
side the Rec. 709 and CIE L* functions. 



Transfer functions in SDTV 

Historically, transfer functions for 480/ SDTV have been 
very poorly specified. The FCC NTSC standard has, since 
1953, specified R'G'B' encoding for a display with 
a "transfer gradient (gamma exponent) of 2.2." 
However, modern CRTs have power function laws very 
close to 2.5! The FCC statement is widely interpreted to 
suggest that encoding should approximate a power of 
V2.2; the reciprocal of V2.2, 0.45, appears in modern 
standards such as Rec. 709. However, as I mentioned 
on page 263, Rec. 709’s overall curve is very close to 
a square root. The FCC specification should not be 
taken too seriously: Use Rec. 709 for encoding. 



Standards for 576/ STDV also have poorly specified 
transfer functions. An "assumed display power func- 
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In Rec. 601 coding with 8 bits, the 
black-to-white range without foot- 
room or headroom encompasses 
220 levels. For linear-light coding of 
this range, 10 bits would suffice: 

4 . 5-220 = 990 ; 990 < 2 10 



4 . 5-880 = 3960 ; 3960 < 2 12 



tion" of 2.8 is mentioned in BBC specifications; some 
people interpret this as suggesting an encoding expo- 
nent of V2.8 ■ However, the 2.8 value is unrealistically 
high. In fact, European displays are comparable to 
displays in other parts of the world, and encoding to 
Rec. 709 is appropriate. 

Although there are standards to specify viewing condi- 
tions in the studio, no standard specifies the transfer 
function of an idealized studio monitor! With studio 
monitor transfer functions unspecified, it is no surprise 
that there is no standard for consumer monitors. 
Implicitly, a 2.5-power function is assumed. 

Bit depth requirements 

In Figure 8.1 on page 76, as part of Chapter 8's discus- 
sion of constant luminance, I indicated that conveying 
relative luminance directly would require about 11 bits. 
That observation stems from two facts. First, studio 
video experience proves that 8 bits is just sufficient to 
convey gamma-corrected R'G'B' - that is, 2 8 (or 256) 
nonlinear levels are sufficient. Second, the transfer func- 
tion used to derive gamma-corrected R'G'B' has 
a certain maximum slope; a maximum slope of 4.5 is 
specified in Rec. 709. The number of codes necessary in 
a linear-light representation is the product of these two 
factors: 256 times 4.5 is 1152, which requires 11 bits. 

In studio video, 8 bits per component barely suffice for 
distribution purposes. Some margin for roundoff error is 
required if the signals are subject to processing opera- 
tions. For this reason, 10-bit studio video is now usual. 
To maintain 10-bit Rec. 709 accuracy in a linear-light 
system would require 12 bits per component; to 
achieve 10-bit L* or sRGB performance would require 
14 bits per component in a linear-light representation. 
The Rec. 709 transfer function is suitable for video 
intended for display in the home, where contrast ratio 
is limited by the ambient environment. For higher- 
quality video, such as home theater, or for the adapta- 
tion of HDTV to digital cinema, we would like a higher 
maximum gain. When scaled to a lightness range of 
unity, CIE L* has a maximum gain of 9.033; sRGB has 
a gain limit of 12.92. For these systems, linear-light 
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representation requires 4 bits in excess of 10 on the 
nonlinear scale - that is, 14 bits per component. 



PDP and DLP devices are 
commonly described as 
employing PWM. However, it is 
not quite the width of the pulses 
that is being modulated, but the 
number of unit pulses per frame. 


If RGB or XYZ tristimulus components were conveyed 
directly, then 16 bits in each component would suffice 
for any realistic image-reproduction purpose. Linear- 
light 16-bit coding might be practical in a decade, but 
for now, for most purposes, we exploit the nonlinear 
characteristics of perception to achieve an efficient 
image data coding. 

Gamma in emerging display devices 

Emerging display devices, such as liquid crystal displays 
(LCDs), have different transfer functions than CRTs. 
Plasma display panels (PDPs) and Digital Light Proces- 
sors (DLPs) both achieve apparent continuous tone 
through pulse width modulation (PWM): They are intrin- 
sically linear-light devices, with straight-line transfer 
functions. Linear-light devices, such as PDPs and DLPs, 
potentially suffer from the "code 100" problem 
explained on page 12: In linear-light, more than 8 bits 
per component are necessary to achieve high quality. 

No matter what transfer function characterizes the 
display, it is economically important to encode image 
data in a manner that is well matched to perceptual 
requirements. The most important aspect of Rec. 709 
encoding is not that it is well matched to CRTs, but that 
it is well matched to perception! The performance 
advantage of perceptual coding, the wide deployment 
of equipment that encodes to Rec. 709, and the huge 
amount of program material already encoded to this 
standard preclude any attempt to establish new stan- 
dards optimized to particular devices. 

A display device whose transfer function differs from 
a CRT must incorporate local correction, to adapt from 
its intrinsic transfer function to the transfer function 
that has been standardized for image interchange. 

CRT transfer function details 

To calibrate your monitor, or to determine the transfer 
function of your CRT, you must be familiar with the 
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0 mV 


Analog video, zero setup, 7:3 (EBU N10; HDTV) 


700 mV 


0 mV 


Analog video, zero setup, 10:4 (NTSC, NTSC-J) 


714 mV 


54 mV 


Analog video, 7.5% setup 


714 mV 


16 


Rec. 601 digital video (interface levels) 


235 


0 


Rec. 601 digital video (processing levels) 


219 


0 


Typ. computer framebuffer code 


255 



Figure 23.6 CRT signal levels and luminance. An analog video signal may be coded between 0 
and 700 mV, between 0 and 714 mV, or between 54 mV and 714 mV. A digital signal may be 
coded from 16 to 235 (for Rec. 601 studio video interface), from 0 to 219 (for Rec. 601 -related 
studio video signal processing), or from 0 to 255 (as is typical in computer graphics). 

electrical interface between a computer framebuffer 
and a monitor. 

Figure 23.6 illustrates the function that relates signal 
input to a CRT monitor to the light luminance produced 
at the face of the screen. The graph characterizes 
a grayscale monitor, or each of the red, green, and blue 
components of a color monitor. The x-axis of the graph 
shows the input signal level, from reference black to 
reference white. The input signal can be presented as 
a digital code, or as an analog voltage according to one 
of several standards. They-axis shows the resulting 
relative luminance. 

Details will be presented in Setup For analog voltage signals, three standards are in use. 

(pedestal), on page 327. The ran g e 54 m V t 0 714 m V j s usec | j n video systems 

that have 7.5% setup, including composite 480/ 
systems such as NTSC, and computer video systems 
that conform to the levels of the archaic EIA RS-343-A 
standard. Computer framebuffer digital-to-analog 
converters often have 7.5% setup; these almost 
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Concerning the conversion between 
Rec. 601 levels and the full-range 
levels used in computing, see 
Figure 27.3, on page 329. 



universally have very loose tolerance - about ±5% of 
full scale - on the analog voltage associated with refer- 
ence black. This induces black-level errors, which in 
turn cause serious errors in the luminance reproduced 
for black. In the absence of a display calibrator, you 
must compensate these framebuffer black-level errors 
by adjusting the black level (or brightness) control on 
your monitor. This act effectively marries the monitor to 
the framebuffer. 

The accuracy of black-level reproduction is greatly 
improved in newer analog video standards that have 
zero setup. The voltage range 0 to 700 mV is used 
in zero-setup standards, including 480/ video in Japan, 
5 76; video in Europe, and HDTV. 

For the 8-bit digital RGB components that are ubiqui- 
tous in computing, reference black and white corre- 
spond to digital codes 0 and 255. The Rec. 601 
interface for studio digital video places black at code 16 
and white at code 235. Either of these digital coding 
standards can be used in conjunction with an analog 
interface having either 7.5% setup or zero setup. 

Knowing that a CRT is intrinsically nonlinear, and that 
its response is based on a power function, many 
researchers have attempted to summarize the nonlin- 
earity of a CRT display in a single numerical parameter y 
using this relationship, where V' is code (or voltage) and 
L is luminance (or tristimulus value): 

y 

L = [v')° Eq 23.10 

The model forces zero voltage to map to zero lumi- 
nance for any value of gamma. Owing to the model 
being "pegged" at zero, it cannot accommodate black- 
level errors: Black-level errors that displace the transfer 
function upward can be "fit" only by an estimate of 
gamma that is much smaller than 2.5. Black-level errors 
that displace the curve downward - saturating at zero 
over some portion of low voltages - can be "fit" only 
with an estimate of gamma that is much larger than 
2.5. The only way the single gamma parameter can fit 
a black-level variation is to alter the curvature of the 
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function. The apparent wide variability of gamma under 
this model has given gamma a bad reputation. 

A much better model is obtained by fixing the expo- 
nent of the power function at 2.5, and using the single 
parameter to accommodate black-level error, e: 

L =(V' + e) 23 Eq 23.11 

This model fits the observed nonlinearity much better 
than the variable-gamma model. 

If you want to determine the nonlinearity of your 
monitor, consult the article by Cowan. In addition to 
describing how to measure the nonlinearity, he 
describes how to determine other characteristics of 
your monitor - such as the chromaticity of its white 
point and its primaries - that are important for accu- 
rate color reproduction. 

Gamma in video, CGI, SGI, and Macintosh 

Transfer functions in video (and PC), computer-gener- 
ated imagery, SGI, and Macintosh are sketched in the 
rows of Figure 23.7 overleaf. Each row shows four func- 
tion blocks; from left to right, these are a camera or 
scanner LUT, an image storage device, an output LUT, 
and a monitor. 

In video, sketched in the top row, the camera applies 
a transfer function to accomplish gamma correction. 
Signals are then maintained in a perceptual domain 
throughout the system until conversion to tristimulus 
values at the monitor. I show the output LUT with 
a ramp that leaves data unaltered: Video systems 
conventionally use no LUT, but the comparison is clari- 
fied if I portray the four rows with the same blocks. 

PC graphics hardware ordinarily implements lookup 
tables at the output of the framestore, as I detailed in 
Raster images, on page 34. However, most PC software 
accommodates display hardware without lookup tables. 
When the LUT is absent, code values map directly to 
voltage, and the situation is equivalent to video. So, the 
top row in the diagram pertains to PCs. 
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Video, PC 

TRISTIM. 



GAMMA 

CORRECTION FRAMESTORE ( implicit ) MONITOR 




1.25 



Computer- 

generated 

imagery 

TRISTIM. 



SGI 

TRISTIM. 



Macintosh 

TRISTIM. 



FRAMEBUFFER 

0 implicit ) FRAMEBUFFER LUT MONITOR 




1.14 



1.14 



1.0 



Figure 23.7 Gamma in video, CGI, SGI, and Macintosh are summarized in the rows of this 
diagram. Tristimulus signals enter from the left; the columns show the transfer functions of 
(respectively) a camera or scanner; the image storage device (framestore or framebuffer); output 
LUT; and the monitor. 

In video, sketched in the top row, a transfer function that mimics vision is applied at the camera 
("gamma correction"); the signal remains in perceptual space until the encoding is reversed by the 
monitor. (PCs have comparable signal encoding.) In computer graphics, sketched in the second row, 
calculations are performed in the linear-light domain, and gamma correction is applied in a LUT at 
the output of the framebuffer. SGI computers take a hybrid approach: Part of the correction is accom- 
plished at the camera or scanner, and part is accomplished through a Vi. 7 -power function that is 
loaded into the LUT. Macintosh computers, sketched in the bottom row, also take a hybrid approach: 
The camera or scanner applies a Vi. 72 power, and a Vi. 45 -power function is loaded into the LUT. 
Using y E = Vi .72 is appropriate for prerendered imagery, to produce an end-to-end exponent of 1.0. 
The end-to-end power function exponent, or rendering intent (see page 81), is shown for each 
row by the number at the extreme right. This number is the product of the exponents across the 
system. Some people call this “system gamma," but that term is so widely misused that I reject it. 
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The Macintosh computer by 
default implements a Vi. 45 -power 
function at the output LUT. John 
Knoll's Gamma Control Panel can 
load the output LUT. When set to 
a gamma value g, the Control 
Panel loads an output LUT with 
a power function whose exponent 
is 2 6 Vg- Strangely, gamma on 
Macintosh computers has come to 
be quoted as the exponent applied 
prior to the framebuffer (whereas 
in other computers it is the expo- 
nent of the table loaded into the 
output LUT). So, the Mac's default 
gamma is said to be 1.8, not 1.45. 

A Macintosh can be set to handle 
video (or PC) R'G'B' data by 
loading a ramp into its output 
LUT. Using Knoll's control panel, 
this is accomplished by setting 
gamma to 2.61. 

JFIF files originated on Macintosh 
ordinarily represent R, G, and B 
display tristimulus values raised to 
the Vi. 72 power. 



Computer graphics systems generally store tristimulus 
values in the framebuffer, and use hardware LUTs, in the 
path to the display, to gamma-correct on the fly. This is 
illustrated in the second row. Typically, a V2. 2 -power 
function is loaded into the output LUT; in this case, 
rendering intent of 1.14 is achieved. 

Macintosh computers use the approach shown in the 
bottom row. The output LUT is, by default, loaded with 
a Vi . 45 -power function. The combination of the default 
LUT and the usual 2.5-power monitor function results 
in a 1.72-power function that relates QuickDraw R'G'B' 
values (such as the values stored in a PICT file or data 
structure) to displayed tristimulus values. 

If a desktop scanner is to produce QuickDraw R'G'B' 
values that display relative luminance correctly, then 
a 1.72-power function must be loaded to the scanner 
LUT. In the typical Macintosh situation, the Vi. 72, 

Vi . 45 , and 2.5 exponents combine to achieve an end- 
to-end exponent of unity. This is suitable for scanning 
photographs or offset printed matter, where a suitable 
rendering intent is already incorporated into the image. 

For QuickDraw R'G'B' values originated by application 
software, part of Macintosh gamma correction must be 
effected by application software prior to presentation of 
R'G'B’ values to the QuickDraw graphics subsystem; the 
remainder is accomplished in the output LUTs. When 
scanning, part of Macintosh gamma correction is 
effected by the LUT in the scanner driver, and the 
remainder is accomplished in the output LUTs. 

Halftoned printing has a builtin nonlinearity, owing to 
the phenomenon of dot gain. Reflectance from the 
printed page is approximately proportional to the 
1.8-power of CMYK code values. QuickDraw R'G'B' 
values are not perceptually optimum; however, appar- 
ently by serendipity, QuickDraw R'G'B' coding is nearly 
perfectly matched to the dot gain of halftone printing. 
This has led to the dominance of Macintosh computers 
in graphic arts and prepress, and has made "gamma 
1 . 8 " image coding a de facto standard. 
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An SGI workstation can be set to 
handle video (or PC) R'C'B' data 
by setting gamma to 1. 



To correct PC image data for display 
on a Mac, apply a 1.45-power func- 
tion. To correct Mac image data for 
display on a PC, apply a 0.69-power 
function - that is, Vi .45 . 
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SGI (formerly Silicon Graphics) computers, by default, 
use an output LUT containing a 1.7-power function; this 
is shown in the third row. If a scanner is to produce 
images for display on an SGI system (without imposing 
any rendering intent), it must incorporate a transfer 
function whose exponent is approximately Vi . 47 . 

At the right-hand end of each row of Figure 23.7, on 
page 274, I have indicated in boldface type the 
rendering intent usually used. In video, I have shown an 
end-to-end power function of 1.25. For computer- 
generated imagery and SGI, I have shown the typical 
value of 1.14. For Macintosh, I have sketched the usual 
situation where prerendered images are being scanned; 
in this case, the end-to-end power function is unity. 

Correct display of computer image data depends upon 
knowing the transfer function that will be applied at the 
output of the graphics subsystem. If an image origi- 
nates on a PC, after traversing the default Vi. 45 -power 
function in a Mac LUT, midtones will display too light: 
Code 128 will produce luminance 1.6 times higher than 
intended. Conversely, if an image originates on a Mac 
(where the Vi. 45 -power function is expected), but is 
displayed on a PC (without this function), midtones will 
display much too dark. The relationship between 
default R'C’B' code values and reproduced luminance 
factors is graphed in Figure 23.8. 

Gamma in computer graphics 

Computer-generated imagery (CGI) software systems 
generally perform calculations for lighting, shading, 
depth-cueing, and antialiasing using approximations to 
tristimulus values, so as to model the physical mixing of 
light. Values stored in the framebuffer are processed by 
hardware lookup tables on the fly on their way to the 
display. If linear-light values are stored in the frame- 
buffer, the LUTs can accomplish gamma-correction. The 
power function at the CRT acts on the gamma- 
corrected signal voltages to reproduce the correct lumi- 
nance values at the face of the screen. Software systems 
usually provide a default gamma value and some 
method to change the default. 
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Figure 23.8 Gamma in Mac and PC are different, owing to the interpretation of RGB code values by 
the display system. On a PC, the output LUT is either absent or programmed as if absent, and code 
values are subject to the 2.5-power function of the display (sketched in the lower curve). On a Mac, 
the default output LUT imposes a Vi. 45 -power function on the code values, then the display imposes 
its usual 2.5-power function; the concatenation of these two functions results in a 1.72-power func- 
tion that relates Mac code value to displayed relative luminance, as sketched in the upper curve. 



You can construct a gamma-correction lookup table to 
apply Rec. 709 to 8-bit tristimulus data, using this 
C code: 

#define CLI P(b,t, v) ((v) <= (b) ? (b) : (v) >= (t) ? (t) : v) 
#define REC_709(L) ((L) <= 0.018 ? (L) * 4.5 : \ 

(1.099 * pow((L), 0.45) - 0.099)) 
int rec_709[256], i; 
for (i=0; i<256; i++) 
rec_709[i] = CLIP(0, 255, 

(int)(0.5 + 255.0 * REC_709(i / 255.))); 

Loading this table into the hardware lookup table at the 
output side of a framebuffer will cause integer RGB tris- 
timulus values r, g, and b between 0 and 255 to be 
gamma-corrected by the hardware as if by the following 
C code: 

red_signal = rec_709[r]; 
green_signal = rec_709[g]; 
blue_signal = rec_709[b]; 
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The Rec. 709 function is suitable for viewing in a dim 
surround. For viewing in other environments, see the 
comments on page 84: In a bright surround, cascade 
Rec. 709 with a 0.9-power function; in a dark surround, 
cascade it with a 1.2-power function. 

The framebuffer's LUTs enable software to perform 
tricks to manipulate the appearance of the image data 
without changing the image data itself. To allow the 
user to make use of features such as accurate color 
reproduction, applications should access lookup tables 
in the structured ways that are provided by the graphics 
system, and not by direct manipulation of the LUTs. 

Gamma in pseudocolor 

In Pseudocolor, on page 38, I described how the color 
lookup table (CLUT) in a pseudocolor system contains 
values that are directly mapped to voltage at the 
display. It is conventional for a pseudocolor application 
program to provide, to a graphics system, R'G'B' color 
values that are already gamma corrected for a typical 
monitor and typical viewing conditions. A pseudocolor 
image stored in a file is accompanied by a colormap 
whose R'G'B’ values incorporate gamma correction. 

Limitations of 8-bit linear coding 

As mentioned in Gamma in computer graphics, on 
page 276, computer graphics systems that render 
synthetic imagery usually perform computations in the 
linear-light - or loosely, intensity - domain. Low-end 
graphics accelerators often perform Gouraud shading in 
the linear-light domain, and store 8-bit components in 
the framebuffer. In The "code 100" problem, on page 12, 
I explained that linear-light representation cannot 
achieve high-quality images with just 8 bits per compo- 
nent: The images will exhibit contouring. The visibility 
of contouring is enhanced by a perceptual effect called 
Mach bands ; consequently, the contouring artifact is 
sometimes called banding. 

High-end systems for computer-generated imagery 
(CGI) usually do not depend on hardware acceleration. 
Rendering software operates in the linear-light domain 
using more than 8 bits per component (often floating 
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Figure 23.9 Linear and 
nonlinear coding in computer 
graphics standards. In the 

PHIGS and CGM standards, 
code [128, 128, 128] produces 
luminance halfway up the 
physical scale, a relative lumi- 
nance of 0.5. In JPEG, code 
[128, 128, 128] produces lumi- 
nance halfway up the percep- 
tual scale, only about 0.18 in 
relative luminance. Values are 
denoted RGB in both cases; 
however, the values are not 
comparable. This exemplifies 
a serious problem in the 
exchange of image files. 
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point), performs gamma correction in software, then 
writes gamma-corrected values into the framebuffer. 

A unity ramp is loaded into the LUT of the framebuffer 
associated with the image. This arrangement maximizes 
perceptual performance, and produces rendered 
imagery without the quantization artifacts of 8-bit 
linear-light coding. 

Professional video software on Macintosh or SGI plat- 
forms ordinarily loads the output LUT with a ramp func- 
tion; code values are then interpreted as in video. 
Unfortunately, colors are altered in image data or inter- 
face elements that assume the default gamma of the 
platform. 

Linear and nonlinear coding in CGI 

Computer graphic standards such as PHIGS and CGM 
make no mention of transfer function, but linear-light 
coding is implicit. In the JPEG standard there is also no 
mention of transfer function, but nonlinear (video-like) 
coding is implicit: Unacceptable results are obtained 
when JPEG is applied to linear-light data. All of these 
standards deal with RGB quantities; you might consider 
their RGB values to be comparable, but they're not! 

Figure 23.9 sketches two systems displaying the same 
RGB triple, [128, 128, 128]. A photometer reading the 
luminance displayed by a PHIGS or CGM system is 
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What are loosely called JPEG files 
use the JPEG File Interchange 
Format (JFIF), cited in the margin 
of page 459. Version 1.02 of that 
specification states that linear-light 
coding (gamma 1.0) is used. That 
is seldom the case in practice; 
instead, encoding power laws of 
0.45 (sRGB) or 0.58 (i.e., 145 / 2 . 5 ) 
are usually used. See page 273. 



shown at the left; a photometer reading luminance 
displayed by a JPEG system is shown at the right. In 
PHIGS and CGM, the displayed luminance is halfway up 
the physical scale, a relative luminance of 0.5. In the 
JPEG case, displayed luminance is halfway up the 
perceptual scale, only about 0.18 in relative luminance. 
The PHIGS and CGM standards are obsolete; however, 
the problem persists that many graphics image files do 
not carry any transfer function information. If you 
exchange RGB image data without regard for transfer 
functions, huge differences will result when image data 
is displayed. 



The digital image-processing literature rarely discrimi- 
nates between linear and nonlinear coding. Also, when 
intensity is mentioned, be suspicious: Image data may 
be represented in linear-light form, proportional to 
intensity. However, a pixel component value is usually 
associated with a small area of a sensor or a display, so 
its units should include a per square meter (-m -2 ) term. 
Pixel component values are ordinarily properly repre- 
sented as radiance, luminance, relative luminance, or 
tristimulus value. 
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This chapter describes color coding systems that are 
used to convey image data derived from additive 
(RGB) primaries. I outline nonlinear R'G'B', explain the 
formation of luma, denoted V", as a weighted sum of 
these nonlinear signals, and introduce the color differ- 
ence (chroma) components [B'-Y', R'-Y'], [C B , C R ], and 

[Pb.PrI 



The design of a video coding system is necessarily 
rooted in detailed knowledge of human color percep- 
tion. However, once this knowledge is embodied in 
a coding system, what remains is physics, mathematics, 
and signal processing. This chapter concerns only the 
latter domains. 

Color acuity 

A monochrome video system ideally senses relative 
luminance, described on page 205. Luminance is then 
transformed by the gamma correction circuitry of the 
camera, as described in Gamma in video, on page 261, 
into a signal that takes into account the properties of 
lightness perception. At the receiver, the CRT itself 
imposes the required inverse transfer function. 

A color image is sensed in three components, red, 
green, and blue, according to Additive reproduction 
(RGB), on page 234. To minimize the visibility of noise 
or quantization, the RGB components should be coded 
nonlinearly. 
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_Wt 

Gray axis 
(, R' = C' = B ') 



Figure 24.1 RGB and R'G'B' cubes. RGB components form the coordinates of a three-dimensional 
color space; coordinate values between 0 and 1 define the unit cube. Linear coding, sketched at 
the top, has poor perceptual performance. In video, RGB components are subject to gamma correc- 
tion; this yields perceptually uniform R'G'B' that exhibits good performance with 8-bit components. 
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RGB and R'G'B' color cubes 



Red, green, and blue tristimulus (linear light) primary 
components, as detailed in Color science for video, on 
page 233, can be considered to be the coordinates of 
a three-dimensional color space. Coordinate values 
between zero and unity define the unit cube of this 
space, as sketched at the top of Figure 24.1 opposite. 
Linear-light coding is used in CGI, where physical light 
is simulated. However, as I explained in the previous 
chapter, Gamma in video, 8-bit linear-light coding 
performs poorly for images to be viewed: 12 or 14 bits 
per component are necessary to achieve excellent 
quality. The best perceptual use is made of a limited 
number of bits by using nonlinear coding that mimics 
the nonlinear lightness response of human vision. As 
introduced on page 12, and detailed in the previous 
chapter, in video, JPEG, MPEG, computing, digital still 
photography, and in many other domains a nonlinear 
transfer function is applied to RGB tristimulus signals to 
give nonlinearly coded (gamma-corrected ) components, 
denoted with prime symbols: R'G'B’. Excellent image 
quality is obtained with 10-bit nonlinear coding with 
a transfer function similar to that of Rec. 709 or sRGB. 

In PC graphics, 8-bit nonlinear coding is common: Each 
of/?', G', and B' ranges from 0 through 255, inclusive, 
following the quantizer transfer function sketched in 
Figure 2.1, on page 17. The resulting R'G'B’ cube is 
sketched at the bottom of Figure 24.1 opposite. A total 
of 2 24 colors - that is, 16777216 colors - are represent- 
able. Not all of them can be distinguished visually; not 
all are perceptually useful; but they are all colors. Studio 
video uses headroom and footroom, as I explained in 
Headroom and footroom, on page 22: 8-bit R'G’B’ has 
219 codes between black and white, for a total of 
10648000 codewords. 

The drawback of conveying R'G'B' components of an 
image is that each component requires relatively high 
spatial resolution: Transmission or storage of a color 
image using R'G'B' components requires a capacity 
three times that of a grayscale image. Human vision has 
considerably less spatial acuity for color information 
than for lightness. Owing to the poor color acuity of 



In video, codeword (or codepoint) 
refers to a combination of three 
integer values such as [/?', O', S'] 
or [V", C B , C R ], 
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vision, a color image can be coded into a wideband 
monochrome component representing lightness, and 
two narrowband components carrying color informa- 
tion, each having substantially less spatial resolution 
than lightness. In analog video, each color channel has 
bandwidth typically one-third that of the monochrome 
channel. In digital video, each color channel has half 
the data rate (or data capacity) of the monochrome 
channel, or less. There is strong evidence that the 
human visual system forms an achromatic channel and 
two chromatic color-difference channels at the retina. 



Here the term color difference refers 
to a signal formed as the difference 
of two gamma-corrected color 
components. In other contexts, the 
term can refer to a numerical 
measure of the perceptual distance 
between two colors. 



Green dominates luminance: Between 60% and 70% of 
luminance comprises green information. Signal-to-noise 
ratio is maximized if the color signals on the other two 
components are chosen to be blue and red. The 
simplest way to"remove" lightness from blue and red is 
to subtract it, to form a pair of color difference (or 
loosely, chroma) components. 



I introduced interface offsets 
on page 23. 



The monochrome component in color video could have 
been based upon the luminance of color science 
(a weighted sum of R, G, and B). Instead, as I explained 
in Constant luminance, on page 75, luma is formed as 
a weighted sum of R', G', and B', using coefficients 
similar or identical to those that would be used to 
compute luminance. Expressed in abstract terms, luma 
ranges 0 to 1. Color difference components B'-Y' and 
R'-Y' are bipolar; each ranges nearly ±1. 

In component analog video, B'-Y' and R'-Y' are scaled 
to form P B and P R components. In abstract terms, these 
range ±0.5. Figure 24.2 opposite shows the unit R'G'B’ 
cube transformed into luma [Y', P B , P R ]. (Various inter- 
face standards are in use; see page 303.) In component 
digital video, B'-Y' and R'-Y' are scaled to form C B and 
C R components. In 8-bit V"C B C R prior to the applica- 
tion of the interface offset, the luma axis of Figure 24.2 
would be scaled by 219, and the chroma axes by 1 12. 



Once color difference signals have been formed, they 
can be subsampled to reduce bandwidth or data 
capacity, without the observer's noticing, as I will 
explain in Chroma subsampling, revisited, on page 292. 
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Figure 24.2 Y'P B P R cube is 
formed when R', G', and B' 
are subject to a particular 
3x3 matrix transform. The 
valid R'G'B' unit cube occu- 
pies about one-quarter of the 
volume of the Y'P B P R unit 
cube. (The volume of the 
Y'P b P r unit cube, the outer 
boundary of this sketch, is 
the same as the volume of 
the R'G'B' cube in Figure 24.1 
on page 282; however, the 
useful codes occupy only the 
central prism here.) Luma 
and color difference coding 
incurs a penalty in signal-to- 
noise ratio, but this disadvan- 
tage is compensated by the 
opportunity to subsample. 



Izraelevitz, David, and Joshua L. 
Koslov, “Code Utilization for 
Component-coded Digital Video," 
in Tomorrow's Television, Proc. 
16th Annual SMPTE Television 
Conference (White Plains, N.Y.: 
SMPTE, 1982), 22-30. 
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It is evident from Figure 24.2 that when R'G'B' signals 
are transformed into the V"P B P R space of analog video, 
the unit R'G'B' cube occupies only part of the volume 
of the unit V"P B P R cube: Only V 4 of the y'P B P R volume 
corresponds to R'G'B' values all between 0 and 1. 
Consequently, V"P B P R exhibits a loss of signal-to-noise 
ratio compared to R'G'B’. However, this disadvantage is 
compensated by the opportunity to subsample. 

A legal signal is one in which no component exceeds its 
reference excursion. Combinations that are P'C'S'-legal 
are termed valid. Signals within the y'P B P R unit cube are 
rPePR-legal. However, about % of these combinations 
correspond to R'G’B’ combinations outside the R'G'B' 
unit cube: Although legal, these Y'P B P R combinations 
are invalid - that is, they are R'G '^'-illegal. 

In digital video, we refer to codewords instead of 
combinations. There are about 2.75 million valid code- 
words in 8 -bit V"C B C R , compared to 10.6 million in 
8 -bit studio R'G'B'. If R'G'B' is transcoded to y'C B C R , 
then transcoded back to R'G'B', the resulting R'G'B’ 
cannot have any more than 2.75 million colors. 
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Figure 24.3 Conventional 
luma/color difference 
encoder. Numerical coeffi- 
cients here are for SDTV; 
different coefficients are 
standard for HDTV. 
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Conventional luma/color difference coding 

I explained constant luminance on page 75. True 
constant luminance coding remains an intriguing possi- 
bility, but at present all video systems use nonconstant 
luminance coding, which I will now describe. 



Figure 24.3 shows a time delay 
element in the luma path. Luma is 
delayed by a time interval equal to 
the transit delay of chroma through 
the chroma bandlimiting filters. 



A conventional luma/color difference encoder is shown 
in Figure 24.3 above. First, a nonlinear transfer func- 
tion is applied to each of the red, green, and blue linear 
(tristimulus) components. Then luma is formed as a 
weighted sum of gamma-corrected R', C, and B' 
components. B’-Y' and R'-Y' color difference compo- 
nents are formed by subtraction; in Figure 24.3, scaling 
to analog P B and P R components is indicated. Finally, 
the color difference components are lowpass filtered. 



Eq 24.1 Rec. 601 Y'P B P R 
encoding matrix (for SDTV) 



0.299 0.587 



P = 



-0.169 -0.331 



0.114 

0.5 



0.5 -0.419 -0.081 



The gray rectangle in Figure 24.3 groups together the 
weighted adder that forms luma with the pair of color 
difference subtractors; the combination is equivalent to 
matrix multiplication by the 3x3 matrix P shown in 
Equation 24.1 in the margin. The numerical values used 
in Equation 24.1, in Figure 24.3, and in subsequent 
figures in this chapter all reflect the Rec. 601 luma coef- 
ficients used in SDTV. Unfortunately, the coefficients for 
FIDTV are different; as I will describe in Component 
video color coding for HDTV, on page 313. 
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Figure 24.4 Conventional 
luma/color difference decoder 

Figure 24.4 illustrates a conventional luma/color differ- 
ence decoder. In a digital decoder, the color difference 
(chroma) components are horizontally (and, in some 
applications, spatially) interpolated; in an analog 
decoder, no circuitry is required to perform this func- 
tion. Luma is added to the color difference components 
to reconstruct nonlinear blue and red components. 

A weighted sum of luma, blue, and red is then formed 
to reconstruct the nonlinear green component. 

Eq 24.2 Rec. 601 V"P B P R The blue and red color difference adders and the 

decoding matrix (for SDTV) weighted adder that recovers green, all enclosed by the 

gray rectangle of Figure 24.4, can be considered 
together as multiplication by the 3x3 matrix P _1 shown 
in Equation 24.2. These values are for SDTV; the matrix 
for HDTV is different. 

To produce linear-light tristimulus components, all 
three components are subject to the inverse transfer 
function sketched at the right with dashed outlines. 
Usually, a decoder is used with a CRT that has an 
intrinsic 2.5-power function, or with some other display 
that incorporates a 2.5-power function; in either case, 
the transfer function need not be explicitly computed. 
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Luminance and luma notation 



See Appendix A, YUV and luminance 
considered harmful, on page 595. 



In Luminance from red, green, and blue, on page 207, 

I described how relative (linear-light) luminance, 
proportional to intensity, can be computed as an appro- 
priately weighted sum of RGB. 

In video, the luminance of color science isn't computed. 
Instead, we compute a nonlinear quantity luma as 
a weighted sum of nonlinear (gamma-corrected) R'G'B’. 
The weights - or luma coefficients - are related to the 
luminance coefficients. The luma coefficients specified 
in Rec. 601 have been ubiquitous for SDTV, but new 
and different weights have been introduced in HDTV 
standards. In my opinion, the luma coefficients need 
not and should not have been changed for HDTV: 
Complexity is added to upconversion and downconver- 
sion in the studio and consumer equipment, for no 
improvement in performance or quality. 

Television standards documents historically used the 
prime symbol (') - often combined with the letter E for 
voltage - to denote a component that incorporates 
gamma correction. For example, E' R historically denoted 
the gamma-corrected red channel. Gamma correction is 
nowadays so taken for granted in video that the E and 
the prime symbol are usually elided. This has led to 
much confusion among people attempting to utilize 
video technology in other domains. 

The existence of several standard sets of primary chro- 
maticities, the introduction of new coefficients, and 
continuing confusion between luminance and luma all 
beg for a notation to distinguish among the many 
possible combinations. In the absence of any standard 
notation, I was compelled to invent my own. 

Figure 24.5 at the top of the facing page sketches the 
notation that I use. The base symbol is Y, R, G, or B. The 
subscript denotes the standard that specifies the chro- 
maticities of the primaries and white. An unprimed 
letter indicates a linear-light tristimulus component ( R , 
G, or B), or relative luminance ( Y ). A prime symbol (') 
indicates a nonlinear (gamma-corrected) component 
(R', G', and S'), or luma (/')■ 
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Figure 24.5 Luminance and 
luma notation is necessary 
because different primary chro- 
maticity sets, different luma 
coefficients, and different 
component scale factors are in 
use. Unity scaling suffices for 
components in this chapter; in 
succeeding chapters, other 
scale factors will be introduced. 



Luminance or luma- 
coefficients: Rec. 601, 
SMPTE 240M, 
or Rec. 709 




601y(rl 

*709 



Prime indicates 
nonlinear (gamma- 
corrected, or luma) 
component 



Scaling: 1 (implicit), - 
steps, or millivolts 



Chromaticity: Rec. 709, 
SMPTE 240M, or EBU 



For luminance or luma, a leading superscript indicates 
the standard specifying the weights used; historically, 
the weights of Rec. 601 were implicit, but recent HDTV 
standards such as Rec. 709 and SMPTE 240M call for 
different weights. Finally, the leading subscript indi- 
cates the overall scaling of the signal. If omitted, an 
overall scaling of unity is implicit, otherwise an integer 
such as 219, 255, or 874 specifies the black-to-white 
excursion in a digital system, or a number such as 661, 
700 or 714 specifies the analog excursion in millivolts. 

Typesetting Y'C B C R (or Y'P B P R ) is a challenge! I illustrate 
the main points in Figure 24.6 below. Y' is augmented 
with shaded leading superscript and subscript and 
a trailing subscript, according to the conventions of 
Figure 24.5. Without these elements, the intended 
color cannot be determined with certainty. 



Figure 24.6 Typesetting Y'C B C R 

is a challenge! Luma coeffi- 
cient set, scaling, and chroma- 
ticities are set out as in 
Figure 24.5 above. The prime 
should always be present, to 
distinguish luma from the lumi- 
nance of color science. C is 
appropriate for digital signals, 

P for analog. Subscripts B and 
R serve as tags, not variables: 
They should be in Roman type, 
not italics. B comes before R. 
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Nonlinear red, green, blue ( R'C'B ') 

Now that I have explained the overall signal flow of 
video, and introduced my notation for the basic 
components, I will detail the encoding of luma and 
color difference signals, starting with the formation of 
nonlinear R', G', and B’ primary components. 

Video originates with approximations of linear-light 
( tristimulus ) RGB primary components, usually repre- 
sented in abstract terms in the range 0 (black) to 
+1 (white). In order to meaningfully determine a color 
from an RGB triple, the colorimetric properties of the 
primaries and the reference white - such as their 
CIE [x, y] chromaticity coordinates - must be known. 
Colorimetric properties of RGB components were 
discussed in Color science for video, on page 233. In the 
absence of any specific information, use the Rec. 709 
primaries and the CIE Dqs white point. 

In Gamma, on page 257, I described how lightness 
information is coded nonlinearly, in order to achieve 
good perceptual performance from a limited number of 
bits. In a color system, the nonlinear transfer function 
described in that chapter is applied individually to each 
of the three RGB tristimulus components: From the set 
of RGB tristimulus (linear-light) values, three gamma- 
corrected primary signals are computed; each is approx- 
imately proportional to the square-root of the corre- 
sponding scene tristimulus value. 

I detailed the Rec. 709 transfer function on page 263. 
Although standardized for HDTV, it is now applied to 
conventional video as well. For tristimulus values 
greater than a few percent, use these equations: 

/? >09 = 1 .099/? 0 45 -0.099 

C' 709 =1.099C°- 45 -0.099 Eq 24.3 

e 709 =1.099e 0 - 45 -0.099 

The obsolete SMPTE 240 M standard for 1035/30 HDTV 
specified transfer function parameters slightly different 
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from those of Rec. 709. For tristimulus values greater 
than a few percent, use these equations: 

/?24o= 1.1 115/? 0 45 -0.1115 

C' 24 o= 1.1 115C 0 - 45 -0.1115 Eq 24.4 

e 240 = 1 . 111 56 ° 45 - 0 . 11 1 5 

The sRGB specification for desktop computing uses 
numerical values slightly different again (page 267). For 
tristimulus values greater than a few percent: 

i 

R'sRGB = 1 -055/? 2 ' 4 -0.055 

i 

C ' sR gb = 1 ° 55/? 2 - 4 -0.055 Eq 24.5 

i 

£' sRG B=1.°55fi 24 -0.055 

Rec. 601 luma 

The following luma equation is standardized in 
Rec. 601 for SDTV, and also applies to JPEG/JFIF (in 
computing) and Exif (in digital still photography): 

60 V= 0.299 /?' + 0.587 G'+O.l 14 B' Eq 24.6 

As mentioned a moment ago, the E and prime symbols 
originally used for video signals have been elided over 
the course of time, and this has led to ambiguity of the 
Y symbol between color science and television. 



The coefficients in the luma equation are based upon 
the sensitivity of human vision to each of the RGB 
primaries standardized for the coding. The low value of 
the blue coefficient is a consequence of saturated blue 
colors having low lightness. The luma coefficients are 
also a function of the white point, or more properly, the 
chromaticity of reference white. 



The Rec. 601 luma coefficients were 
computed using the technique that 
I explained in Luminance coeffi- 
cients, on page 250, using the NTSC 
primaries and white point of 
Table 22.2, on page 238. 



In principle, luma coefficients should be derived from 
the primary and white chromaticities. The Rec. 601 
luma coefficients of Equation 24.6 were established in 
1953 by the NTSC from the primaries and white point 
then in use. Primaries have changed over the years 
since the adoption of NTSC. The primaries in use for 
480/ today are approximately those specified in SMPTE 
RP 145; the primaries in use for 576/ are approxi- 
mately those specified in EBU Tech. 3213. (These 
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The mismatch between the 
primaries and the luma coeffi- 
cients of SDTV has little practical 
significance; however, the 
mismatch of luma coefficients 
between SDTV and HDTV has 
great practical significance! 
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primary sets are slightly different; both sets very nearly 
match the primaries of Rec. 709.) Despite the change in 
primaries, the luma coefficients for 480/ and 576/ 
video have remained unchanged from the values that 
were established in 1953. As a consequence of the 
change in primaries, the luma coefficients in SDTV no 
longer theoretically match the primaries. The mismatch 
has little practical significance. 

Rec. 709 luma 

International agreement on Rec. 709 was achieved in 
1990 on the basis of "theoretically correct" luma coeffi- 
cients derived from the Rec. 709 primaries: 

70 V= 0.2126 /?' + 0.7152C'+0.0722 B 1 Eq 24.7 

SMPTE 240M-1988 luma 

Two years before Rec. 709 was adopted, SMPTE stan- 
dardized luma coefficients for 1035/30 HDTV that were 
"theoretically correct" for the SMPTE RP 145 primaries 
in use at the time: 

24 t V= 0.212 R' +0.701 C +0.087 B' Eq 24.8 

Following the establishment of Rec. 709, the succes- 
sors to SMPTE 240M - such as SMPTE 274M for 
1920x1080 HDTV - specified the primaries, transfer 
function, and luma coefficients of Rec. 709. However, 
provisions were made in these standards to accommo- 
date the 240M parameters as an "interim implementa- 
tion," and the 240 M parameters remain in use in some 
of the HDTV equipment that is deployed as I write this. 
The most recent revision of SMPTE 274M dispenses 
with the"interim implementation" and embraces 
Rec. 709. 

Chroma subsampling, revisited 

The purpose of color difference coding is to enable 
subsampling. In analog video, the color difference 
components are subject to bandwidth reduction 
through the use of analog lowpass filters; horizontal 
color detail is removed. In digital video, the chroma 
components are subsampled, or decimated, by filtering 
followed by the discarding of samples. Figure 10.3, 
Chroma subsampling, on page 90, sketches several 
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digital subsampling schemes. In 4:2:2 subsampling, 
after filtering, alternate color difference samples are 
discarded at the encoder. In 4:2:0, vertical chroma 
detail is removed as well. At the decoder, the missing 
samples are approximated by interpolation. 

In analog chroma bandlimiting, and in digital subsam- 
pling, some color detail is lost. However, owing to the 
poor color acuity of vision, the loss cannot be detected 
by a viewer at normal viewing distance. 

Some low-end digital video systems simply drop 
chroma pixels at the encoder without filtering, and 
replicate chroma pixels at the decoder. Discarding 
samples can be viewed as point sampling; that opera- 
tion runs the risk of introducing aliases. Proper decima- 
tion and interpolation filters should be used; these 
should be designed according to the principles 
explained in Filtering and sampling, on page 141. 

Luma/color difference summary 

When luma and color difference coding is used for 
image interchange, it is important for the characteris- 
tics of red, green, and blue to be maintained from the 
input of the encoder to the output of the decoder. The 
chromaticities of the primaries were detailed in Color 
science for video, on page 233, and mentioned in this 
chapter as they pertain to the encoding and decoding 
of luma. I have assumed that the characteristics of the 
primaries match across the whole system. The prima- 
ries upon which luma and color difference coding are 
based are known as the interchange (or transmission) 
primaries. 

In practice, a camera sensor may produce RGB compo- 
nents whose chromaticities do not match the inter- 
change primaries. To achieve accurate color 
reproduction in such a camera, it is necessary to insert 
a 3x3 matrix that transforms tristimulus signals from 
the image capture primaries to the interchange prima- 
ries. (This is the "linear matrix" built into the camera.) 
Similarly, a decoder may be required to drive a display 
whose primaries are different from the interchange 
primaries; at the output of the decoder, in may be 
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TRISTIMULUS 3x3 
("LINEAR MATRIX") 



TRANSFER 

FUNCTION 



NONLINEAR 

3x3 



CHROMA 

SUBSAMPLING 




XYZ RGB R'G'B' rC B C R rC B C R 

or e.g., 4:2:2 

Figure 24.7 Luma/color difference encoder, involves the four stages summarized in this block 
diagram. First, linear-light (tristimulus) input signals are transformed through the "linear" matrix 
Tj 1 to produce RGB coded to the interchange primaries. Gamma correction is then applied. The 
matrix P then produces luma and two color differences. The color difference (chroma) signals are 
then subsampled; luma undergoes a compensating delay. 



necessary to insert a 3x3 matrix that transforms from 
the interchange primaries to the image display prima- 
ries. (See page 252.) 



I use T to denote the encoding 
"linear matrix," to conform to the 
notation of Luminance coeffi- 
cients, on page 250. 



Interchange primaries are also 
called transmission primaries. 



Figure 24.7 above summarizes luma/color difference 
encoding. If image data originated in linear AYZ compo- 
nents, a 3x3 matrix transform (Dj 1 ) would be applied to 
obtain linear RGB having chromaticities and white refer- 
ence of the interchange primaries. For Rec. 709 inter- 
change primaries standard for SDTV and FIDTV, the 
matrix would be that of Equation 22.9, on page 251 . 
More typically, image data originates in some device- 
dependent space that I denote , and the 3x3 

"linear matrix" transform (Fj 1 ) is determined by the 
camera designer. See the sequence of Figures 22.3 
through 22.8, starting on page 244, and the accompa- 
nying text and captions, to gain an appreciation for how 
such a matrix might be crafted. Practical cameras do not 
have spectral sensitivities that are linear combinations 
of the CIE color matching functions, so they are not 
properly characterized by chromaticities. Nonetheless, 
once a linear matrix to a set of interchange primaries 
has been chosen, Equation 22.10 can be used to derive 
equivalent sensor primaries (the "taking primaries"). 



Once the linear matrix has been applied, each of the 
components is subject to a nonlinear transfer function 
(gamma correction) that produces nonlinear R'G'B'. 
These components are transformed through a 3x3 
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Y'C B C R Y'C b C r R'G'B' RGB XYZ 

e.g., 4:2:2 or R 2 C 2 S 2 

Figure 24.8 Luma/color difference decoder involves the inverse of the four stages of Figure 24.7 
in opposite order. First, subsampled color difference (chroma) signals are interpolated; luma 
undergoes a compensating delay. The matrix P _1 then recovers R'G'B' from luma and two color 
differences. A transfer function having an exponent of about 2.5 is then applied, which produces 
linear-light (tristimulus) signals RGB. If the display's primaries differ from the interchange 
primaries, RGB are transformed through the matrix T 2 to produce appropriate /?2G2®2- 



Figures 24.7 and 24.8 show 
3x3 matrix transforms being 
used for two distinctly different 
tasks. When someone hands 
you a 3x3, you have to ascer- 
tain whether it is intended for 
a linear or nonlinear task. 



matrix (P), to obtain luma and color difference compo- 
nents V"C B C R or Y'P b P r . (This matrix depends upon the 
luma coefficients in use, and upon color difference scale 
factors.) Then, if necessary, a chroma subsampling filter 
is applied to obtain subsampled color difference 
components; luma is subject to a compensating delay. 

A decoder uses the inverse operations of the encoder, 
in the opposite order, as sketched in Figure 24.8. In 
a digital decoder, the chroma interpolation filter recon- 
structs missing chroma samples; in an analog decoder, 
no explicit operation is needed. The 3x3 color differ- 
ence matrix (P -1 ) reconstructs nonlinear red, green, and 
blue primary components. The transfer functions restore 
the primary components to their linear-light tristimulus 
values. Finally, the tristimulus 3x3 matrix (P 2 ) trans- 
forms from the primaries of the interchange standard to 
the primaries implemented in the display device. 

When a decoder is intimately associated with a CRT 
monitor, the decoder's transfer function is performed by 
the nonlinear voltage-to-luminance relationship 
intrinsic to the CRT: No explicit operations are required 
for this step. Flowever, to exploit this transfer function, 
the display primaries must be the same as - or at least 
very similar to - the interchange primaries. 
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The transfer functions of the decoder (or the CRT) are 
intertwined with gamma correction. As I explained on 
page 81, an end-to-end power function having an 
exponent of about 1.25 is appropriate for typical televi- 
sion viewing in a dim surround. The encoder of 
Figure 24.7 imposes a 0.5-power function; the decoder 
of Figure 24.8 imposes a 2.5-power. The product of 
these implements the end-to-end power function. 



Owing to the dependence of the 
optimum end-to-end power func- 
tion upon viewing conditions, 
there here ought to be a user 
control for rendering intent - 
perhaps even replacing brightness 
and contrast - but there isn't! 



When viewing in a light surround, a 1.125 end-to-end 
power is appropriate; when driving a CRT, a 0.9-power 
function should intervene. When viewing in a dark 
surround, a 1.5 end-to-end power is appropriate; 
a 1.2-power function should intervene. If the transfer 
function of a display device differs from that of a CRT, 
then decoding should include a transfer function that is 
the composition of a 2.5-power function and the 
inverse transfer function of the display device. 



If the display primaries match the interchange prima- 
ries, the decoder's 3x3 tristimulus matrix is not needed. 
If a CRT display has primaries not too different from the 
interchange primaries, then it may be possible to 
compensate the primaries by applying a 3x3 matrix in 
the nonlinear domain. But if the primaries are quite 
different, it will be necessary to apply the transform 
between primaries in the tristimulus domain; see Trans- 
forms among RCB systems, on page 252. 

SDTV and HDTV luma chaos 

Although the concepts of V"P B P R and Y'C B C R coding are 
identical in SDTV and HDTV, the Rec. 709 standard 
has - unfortunately, in my opinion - established a new 
„ set of luma coefficients for HDTV. That set differs 

'O' dramatically from the luma coefficients for SDTV speci- 

60 V'c b c r i i — ! fied in Rec. 601. There are now two flavors of V"C B C R 

% o SDTV coding, as suggested by Figure 24.9 in the margin; 

L ~L 709 y , c c I denote the flavors 601 V"C B C R for SDTV, and 709 V"C B C R 

l l b r for HDTV. Similarly, there are two flavors of V"P B P R for 

HDTV analog systems, 601 V"P B P R for SDTV, and 709 Y"P B P R for 

HDTV. (A third luma coefficient set was specified in 
Figure 24.9 Luma/color SMPTE 240M-1988; though obsolete, that set 

difference flavors continues to be used in legacy 1035/ equipment.) 
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It's sensible to use the term 
colorbar test signal (or pattern) 
instead of colorbar test image: 
The image is not standardized. 



System 


Primary 

chromaticity 


Transfer 

function 


Luma 

coefficients 


480 / 


SMPTE RP 145 


Rec. 709 


Rec. 601 


5 76/ 


EBU Tech. 3213 


Rec. 709 


Rec. 601 


1035/ 


SMPTE RP 145 


SMPTE 240M 


SMPTE 240M 


720 p, 
1080/, 


Rec. 709 


Rec. 709 


Rec. 709 


1080p 









Table 24.1 Chromaticity, transfer function, and luma 
combinations circa 2002 are summarized. 



In my view, it is extremely unfortunate that different 
coding has been adopted: Image coding and decoding 
now depend on whether the picture is small (conven- 
tional video, SDTV) or large (HDTV); that dependence 
erodes the highly useful concept of resolution-indepen- 
dent production in the V"C B C R 4:2:2 and 4:2:0 domains. 
In my opinion, V"C B C R should have been standardized 
with C B and C R having identical excursion to luma, and 
HDTV should have been standardized with the Rec. 601 
luma coefficients. With things as they stand, the smor- 
gasbord of color-encoding parameters makes accurate 
image interchange extremely difficult. The situation is 
likely to get worse with time, not better. 

Table 24.1 above summarizes the standards for primary 
chromaticities, transfer functions, and luma coefficients 
that are either implicit or explicit in several SDTV and 
HDTV standards. When video is converted among these 
standards, appropriate processing should be performed 
in order to preserve the intended color. 

It is a problem that the colorbar test signal is standard- 
ized in the R'G'B' domain, without any reference to 
primaries, transfer function, or luma coefficients. The 
colors of the bars depend upon which primary chroma- 
ticities are in use; the luma and color difference levels 
of the bars depend upon which luma coefficients are in 
use. When color conversions and standards conver- 
sions are properly performed, the colors and levels of 
the colorbar test signal will change! 
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Luma/color difference component sets 

These color difference component sets, all based upon 
B'-Y' and R'-Y', are in use: 

Y'P B P R coding is used in component analog video; P B 
and P R are scaled to have excursion nominally identical 
to that of luma. Y'P$P R can be potentially based upon 
any of three sets of luma coefficients: Rec. 601 for 
SDTV, SMPTE 240M ("interim implementation") for 
HDTV, or Rec. 709 for HDTV. In 480/29.97 SDTV, three 
different analog interface standards are in use: EBU N10 
"SMPTE," Sony, and Panasonic. 

V"C B C R coding is used for component digital video; C B 
and C R are scaled to have excursion 224 / 2-\9 that of 
luma. A "full-range" variant is used in JPEG/JFIF. V'C B C R 
can be potentially based upon Rec. 601, SMPTE 240M, 
or Rec. 709 luma coefficients. 

In NTSC and PAL chroma modulation, on page 335, I will 
detail two additional component sets, whose proper 
use is limited to composite NTSC and PAL SDTV: 

Y'UV components are only applicable to composite 
NTSC and PAL systems. B'-Y' and R'-Y' are scaled so as 
to limit the excursion of the composite (luma plus 
modulated chroma) signal. Y'UV coding is always based 
upon Rec. 601 luma coefficients. 

Y'lQ components are only applicable to certain 
composite NTSC systems. UV components are rotated 
33° and axis-exchanged, to enable wideband-/ trans- 
mission. This is an obsolete technique that is rarely, if 
ever, practiced nowadays. Y'lQ coding is always based 
upon Rec. 601 luma coefficients. 

The bewildering set of scale factors and luma coeffi- 
cients in use is set out in Table 24. 2A opposite for 
analog SDTV, Table 24. 2B overleaf for digital SDTV and 
computing systems, and Table 24. 2C for analog and 
digital HDTV. The following two chapters detail compo- 
nent color coding for SDTV and HDTV, respectively. 
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Notation 



System 



Color difference scaling 



700 ^ 145 ^B^R' The EBU N10 standard calls for 7:3 picture- 
to-sync ratio, 700 mV luma excursion with 

700>1bi/bPr zer ° = etu P- P B and P R components are 

scaled individually to range ±350 mV, an 
excursion identical to luma. 

2 Component analog video, 480/ p b Pr Sony de facto standards call for 10:4 

Sony, 7.5% setup 3 picture-to-sync ratio, 7.5% setup, and 

black-to-white luma excursion of 
approximately 661 mV. P B and P R 
components are scaled individually to 
range V 3 times ±350 mV, that is, 

±466 2 / 3 mV. 



1 Component analog video, 480/ 
(EIA/CEA-770 and "SMPTE") and 
576/ EBU N10; also, 480/ 
Panasonic AA- 1 1 , zero setup 
(Japan) 3 



3 Component analog video, 480/ 7 ^Y\ 45 P b P r 
Sony, zero setup (Japan) 3 



4 Component analog video, 480/ % 7 lYi 4 cP B P R 
Panasonic, 7.5% setup 3 



Sony de facto standards call for 10:4 
picture-to-sync ratio, zero setup, and 
black-to-white luma excursion of 
approximately 714 mV. P B and P R 
components are scaled individually to 
range % times ±350 mV, that is, 

±466 2 / 3 mV. 

Panasonic de facto standards call for 10:4 
picture-to-sync ratio, zero setup, and 
black-to-white luma excursion of 
approximately 674.5 mV. P B and P R 
components are scaled individually to 
range 37 / 4 q times ±350 mV, that is, 
±323.75 mV. 



5 Composite analog NTSC, PAL 
video (incl. S-video, V/C688, 
etc.) 



various, typ. 



601 



YionUV, 



700' EBU 
7I4T145/Q 



U and V are scaled to meet a joint 
constraint: Scaling is such that peak 
composite video - luma plus modulated 
chroma - is limited to 4 / 3 of the blanking- 
to-white excursion. Rotation and exchange 
of axes (e.g., / and Q) cannot be 
distinguished after analog encoding. There 
is no standard component interface. 



Table 24. 2A Color difference systems for analog SDTV. The EBU N10 levels indicated in the 
shaded (first) row are sensible but unpopular. Designers of 480/ SDTV studio equipment are forced 
to implement configuration settings for three interface "standards": EBU N10 ("SMPTE"), Sony, and 
Panasonic. 



a The component analog interface for consumer equipment (such as DVD players) is properly 
scaled Y'P B P R , according to EIA/CEA-770. 2, cited on page 509. Some consumer equipment 
has been engineered and deployed with incorrect Y'P B P R scaling. Certain consumer devices 
have rear-panel connectors labelled Y, B-Y, R-Y, or YUV\ these designations are plainly wrong. 



CHAPTER 24 



LUMA AND COLOR DIFFERENCES 



299 



Notation 



System 



Color difference scaling 



6 Component digital video: 4:2:0, 601 ^ r r Rec. 601 calls for luma range 0... 219, 

4:1 :1, Rec. 601 4:2:2 offset +16 at the interface. C B and C R are 

(incl. M-JPEG, MPEG, DVD, scaled individually to range ±112, an 

DVC) excursion 22 1/219 °f luma, offset +128 at 

the interface. Codes 0 and 255 are 
prohibited. 



7 Component digital stillframe 
JPEG (incl. J FI F 1.02), typical 
desktop publishing and www. 
Transfer functions vary; see the 
marginal note on page 280. 



8 Composite digital video: 
4/ sc 5 76/ PAL 



255 Yj 09 C B C R There is no comprehensive standard. Luma 
reference range is typically 0 through 255. 
C B and C R are typically scaled individually 
to a "full range" of ±128, an excursion 
256/ 255 that of luma. C B and C R codes +128 
are clipped; fully saturated blue and fully 
saturated red cannot be represented. 

14 .}Y bbu UV U and V are scaled to meet a joint 

constraint such that peak composite 
video - luma plus modulated chroma - is 
limited to 4 / 3 of the blanking-to-white 
excursion. Obsolescent. 



9 Composite digital video: 

4/ sc 480/ NTSC (7.5% setup) 

10 Composite digital video: 

4/sc 480/ NTSC-J (zero setup) 



601 iq Scaling is identical to Y'UV, but axes are 
rotated 33° exchanged, and denoted 
/ and Q. Obsolescent. 

140^45/Q Scaling is identical to Y'UV, but axes are 
rotated 33° exchanged, and denoted 
/ and Q. Obsolescent. 



Table 24. 2B Color difference systems for digital SDTV and computing. The 

scaling indicated in the shaded (first) row is recommended for new designs. 



System Notation Color difference scaling 

11 Component analog HDTV 700 ^ 709 ^b^r ^:3 P' c ture-to-sync ratio, 700 mV luma 

excursion with zero setup. P B and P R 
components are scaled individually to 
range ±350 mV, an excursion identical to 
luma. 

Rec. 709 calls for luma range 0...219, 
offset +16 at the interface. C B and C R are 
scaled individually to range ±112, an 
excursion 22 V219 °f luma, offset +128 at 
the interface. Codes 0 and 255 are 
prohibited. 

Rec. 1361 Y'C b C r is identical to Rec. 709 
Y'C b C r , except that some codewords 
outside the R'G'B' unit cube represent 
wide-gamut colors. 

Table 24. 2C Color difference systems for HDTV. The luma coefficients for 
HDTV differ dramatically from those of SDTV. 



12 Component digital HDTV 219>709 <hA 

(Rec. 709) 



13 Component digital HDTV 136^- , - c 

(Rec. 1361) 
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Component video 
color coding for SDTV 
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Various scale factors are applied to the basic color 
difference components B'-Y' and R'-Y' for different 
applications. In the previous chapter, I introduced luma 
and color difference coding; in this chapter, I will detail 
the following coding systems: 

• B'-Y', R'-Y' components form the numerical basis for 
all the other component sets; otherwise, they are not 
directly used. 

• P B P R components are used for component analog video 
(including DVD player analog interfaces). 

• C B C R components as defined in Rec. 601 are used for 
component digital video, including studio video, 
M-JPEG, and MPEG. 



• "Full range" C B C R components are used in JPEG/JFIF. 



Video uses the symbols U and V 
to represent certain color differ- 
ence components. The CIE 
defines the pairs [u, v], [u\ v'], 
and [u* v*]. All of these pairs 
represent chromatic or chroma 
information, but they are all 
numerically and functionally 
different. Video [U, V ] compo- 
nents are neither directly based 
upon, nor superseded by, any of 
the CIE color spaces. 



• UV components are used for NTSC or PAL, as I will 
describe on page 336. 

• IQ components were historically used for NTSC, as I will 
describe on page 367. 

Y'UV and Y'lQ are intermediate quantities toward the 
formation of composite NTSC, PAL, and S-video. 
Neither Y'UV nor Y'lQ has a standard component inter- 
face, and neither is appropriate when the components 
are kept separate. Unfortunately, the Y'UV nomencla- 
ture has come to be used rather loosely, and to some 



301 



For a discussion of primary 
chromaticities, see page 236. 



people it now denotes any scaling of B'-Y' and R'-Y'. 

I will detail the formation of true Y'UV and Y'lQ in NTSC 
and PAL chroma modulation, on page 335. 

The coding systems described in this chapter can be 
applied to various RCB primary sets - EBU 3213, 

SMPTE RP 145 (or potentially even Rec. 709). Rec. 601 
does not specify primary chromaticities: SMPTE RP 145 
primaries are implicit in 480/ systems, and EBU 3213 
primaries are implicit in 5 76/ systems. 

The equations for [Y\ B'-Y', R'-Y 1 ], V"P B P R , and V"C B C R 
can be based upon either the Rec. 601 luma coeffi- 
cients of SDTV or the Rec. 709 coefficients of HDTV. 
The equations and figures of this chapter are based 
upon the Rec. 601 coefficients. Unfortunately, the luma 
coefficients that have been standardized for HDTV are 
different from those of Rec. 601. Concerning the HDTV 
luma coefficients, see Rec. 709 luma on page 292; for 
details of HDTV color difference components, see 
Component video color coding for HDTV, on page 313. 
Surprisingly, broadcasters in Japan apparently intend to 
retrofit their SDTV broadcast plant with Rec. 709 luma 
coefficients according to the equations that I will detail 
in the following chapter, Component video color coding 
for HDTV. 

Chroma components are properly ordered B'-Y' then 
R'-Y'; or P B then P R ; or C B then C R . Blue associates 
with U, and red with V; U and V are ordered alphabeti- 
cally. The subscripts in C B C R and P B P R are often written 
in lowercase. In my opinion, this compromises read- 
ability, so I write them in uppercase. The B in C B serves 
as a tag, not a variable, so I set it in Roman type (not 
italics). Authors with great attention to detail some- 
times “prime" C B C R and P B P R to indicate their nonlinear 
origin, but because no practical image coding system 
employs linear-light color differences, I consider it safe 
to omit the primes. 
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+ 1 -, 



R'-Y' axis 



Figure 25.1 B'-Y', R'-Y' 
components for SDTV 




-T 



ITU-R Rec. BT.601-5, Studio 
encoding parameters of digital 
television for standard 4:3 and 
wide-screen 16:9 aspect ratios. 



B'-Y', R'-Y' components for SDTV 

To obtain [Y\ B'-Y', R'-Y'] components from R'C'B', for 
Rec. 601 luma, use this matrix equation: 



0\ 

o 

-< 




0.299 


0.587 


0.114 






g'_ 6 °y 


= 


-0.299 


-0.587 


0.886 


• 


G 


/?'- 60 V 




0.701 


-0.587 


-0.114 




B‘ 



Figure 25.1 shows a plot of the [B'-Y', R'-Y'] color 
difference plane. 

As I described on page 291, the Rec. 601 luma coeffi- 
cients are used for SDTV. With these coefficients, the 
B'-Y' component reaches its positive maximum at 
pure blue (/?' = 0, G' = 0, fl' = 1; Y' = 0.114; 

B'-Y' = +0.886) and its negative maximum at pure 
yellow (B’-Y' = -0.886). Analogously, the extrema of 
R'-Y' take values ±0.701, at pure red and cyan. These 
are inconvenient values for both digital and analog 
systems. The P B P R , C b C r , and UV color difference 
components all involve versions of [V", B’-Y', R'-Y'] 
that are scaled to place the extrema of the component 
values at more convenient values. 

P B P R components for SDTV 

P B and P R denote color difference components having 
excursions nominally identical to the excursion of the 
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Figure 25.2 P B P R 
components for SDTV 




Eq 25.2 



accompanying luma component. For Rec. 601 luma, the 
equations are these: 



P B = 



P R = 



0.5 

1-0.114 

0.5 

1-0.299 




1 

1.772 

1 

1.402 




~ 0.564 
~ 0.7131 





These scale factors are chosen to limit the excursion of 
each color difference component to the range -0.5 to 
+0.5 with respect to unity luma excursion: 0.1 14 in the 
first expression above is the luma coefficient of blue, 
and 0.299 in the second is for red. Figure 25.2 above 
shows a plot of the [P B , P R ] plane. 



Expressed in matrix form, the B'-Y' and R'-Y' rows of 
Equation 25.1 are scaled by 0 5 /o.886 and °- 5 /o.7oi. 

To encode from R'G'B' where reference black is zero 
and reference white is unity: 





601 y, 




0.299 


0.587 


0.114 






Eq 25.3 


P B 


= 


-0.168736 


-0.331264 


0.5 


• 


G 




Pr 




0.5 


-0.418688 


-0.081312 




B‘ 



The first row of Equation 25.3 comprises the luma coef- 
ficients; these sum to unity. The second and third rows 
each sum to zero, a necessity for color difference 
components. The two entries of 0.5 reflect the refer- 
ence excursions of P B and P R , at the blue and red prima- 
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ries [0, 0, 1] and [1, 0, 0]. The reference excursion is 
±0.5; the peak excursion may be slightly larger, to 
accommodate analog undershoot and overshoot. There 
are no standards for how much analog footroom and 
headroom should be provided. 



The inverse, decoding matrix is this: 



/?'" 




"l 


0 


1 .402 




601y," 


G 


= 


1 


-0.344136 


-0.714136 


• 


P B 


B' 




1 


1.772 


0 




P R 



See Table 24. 2A on page 299; 
Component analog Y'P B P R interface, 
EBU N10, on page 508; and 
Component analog Y'P B P R interface, 
industry standard, on page 509. 



Y'P B P R is employed by 480/ and 5 76/ component 
analog video equipment such as that from Sony and 
Panasonic, where P B and P R are conveyed with roughly 
half the bandwidth of luma. Unfortunately, three 
different analog interface level standards are used: 
Y’P b P r is ambiguous with respect to electrical interface. 



P B and P R are properly written in that order, as 
I described on page 302. The P stands for parallel, 
stemming from a failed effort within SMPTE to stan- 
dardize a parallel electrical interface for component 
analog video. In C B C R , which I will now describe, C 
stands for chroma. The C B C R notation predated P B P R . 

C B C R components for SDTV 

A straightforward scaling of V"P B P R components would 
have been suitable for digital interface. Scaling of luma 
to the range [0 ... 255] would have been feasible; this 
"full range" scaling of luma is used in JPEG/JFIF used in 
computing, as I will describe on page 310. However, for 
studio applications it is necessary to provide signal- 
processing footroom and headroom to accommodate 
ringing from analog and digital filters, and to accommo- 
date signals from misadjusted analog equipment. 



For an 8-bit interface, luma could have been scaled to 
an excursion of 224; B'-Y' and R'-Y' could have been 
scaled to ±112. This would have left 32 codes of foot- 
room and headroom for each component. Although 
sensible, that approach was not taken when Rec. 601 
was adopted in 1984. Instead - and unfortunately, in 
my opinion - different excursions were standardized for 
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The Y'P B P R and V"CgC R scaling 
discrepancy is unfortunate 
enough, but it is compounded 
by “full range" V"C B C R used in 
JPEG/JFIF, scaled similarly but 
not identically to Y'P B P B \ see 
page 310. Confusion is 
compounded by the EBU refer- 
ring in Technical Standard 
N10-1998 to C B C R analog color 
difference components, when 
they are properly denoted P B P R . 


luma and chroma. Eight-bit luma excursion was stan- 
dardized at 219; chroma excursion was standardized at 
224. Each color difference component has as excursion 
22 1 / 2 i 9 that of luma. Since video component ampli- 
tudes are usually referenced to luma excursion, this 
condition is more clearly stated the opposite way: In 
Y'C B C R , each color difference component has 22 1 / 2 i 9 
the excursion of the luma component. The notation 
C B C R distinguishes this set from P B P R , where the luma 
and chroma excursions are nominally identical: Concep- 
tually, Y'P B P R and Y'C B C R differ only in scaling. 

Historically, Y'P B P R scaling was used at analog inter- 
faces, and Y'C B C R was used at digital interfaces. Nowa- 
days so many different scale factors and offsets are in 
use in both the analog and digital domains that the 
dual nomenclature is more a hindrance than a help. 

To provide footroom to accommodate luma signals that 
go slightly negative, an offset is added to luma at a 
Y'C b C r interface. At an 8-bit interface, an offset of +16 
is added; this places black at code 16 and white at code 
235. At an 8-bit interface, codes 0 and 255 are used for 
synchronization purposes; these codes are prohibited 
from video data. Codes 1 through 15 are interpreted as 
signal levels - 1 ^ / 219 through -V219 (respectively), rela- 
tive to unity luma excursion; codes 236 through 254 
are interpreted as signal levels 220 /2i9 through 238 /2i9 
(respectively), relative to unity excursion. Unfortu- 
nately, luma footroom and headroom are asymmetrical. 

C B C R color difference components are conveyed in 
offset binary form: An offset of +128 is added. In studio 
Y'C B C Rl chroma reference levels are 16 and 240, and 
codes 0 and 255 are prohibited from chroma data. 

Rec. 601 provides for 10-bit components; 10-bit studio 
video equipment is now commonplace. At a 10-bit 
interface, the 8-bit interface levels and prohibited codes 
are maintained; extra bits are appended as least-signifi- 
cant bits to provide increased precision. The prohibited 
codes respect the 8-bit interface: Codes having all 
8 most-significant bits either all zeros or all ones are 
prohibited from video data across a 10-bit interface. 
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Figure 25.3 C B C R compo- 
nents for SDTV are shown 
in their mathematical form. 
The range outside 




+112 C R axis 



[-112 ... +112] is available 
for undershoot and over- 
shoot. At an 8-bit inter- 
face, an offset of +128 is 
added to each color differ- 



ence component. -112 



+112 C B axis 



-112 



For signal-processing arithmetic operations such as gain 
adjustment, V", C B , and C R must be zero for black: The 
interface offsets must be removed. For 8-bit luma arith- 
metic, it is convenient to place reference black at 
code 0 and reference white at code 219. Color differ- 
ence signals are most conveniently handled in two's 
complement form, scaled so that reference color differ- 
ence signals (at pure yellow, cyan, red, and blue) are 
±112. Figure 25.3 above shows the C B C R color differ- 
ence plane scaled in this manner, without offsets. 

As far as I am concerned, the offsets should be treated 
as an interface feature. Most descriptions of V"C B C R , 
though - including SMPTE and ITU standards - take the 
rC B C R notation to include the offset. In the equations 
to follow, I include the offset terms in gray. If your goal 
is to compute abstract, mathematical quantities suit- 
able for signal processing, omit the offsets. If you are 
concerned with the interface, include them. 

These equations form Rec. 601 V"C B C R components 
from [V", B'-Y', R'-Y 1 ] components ranging [0 ... +1]: 



The numerical values used in this 
equation, and in those to follow, 
are based on the Rec. 601 luma 
coefficients. The coefficients for 
HDTV are, unfortunately, different. 
See Rec. 601 luma, on page 291 . 




Eq 25.5 
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To extend Marginal note to 10 bits, append to each of 
Y', C B , and C R two low-order bits having binary weights 
V 2 and V 4 . To extend Y'C B C R beyond 10 bits, continue 
the sequence with LSBs weighted Vs, Vi 6, and so on. If 
you prefer to express these quantities as whole 
numbers, without fractional bits, multiply Marginal 
note (and all of the equations to follow) by 2 k ~ 8 , 
where k > 8 denotes the number of bits. 



Eq 25.6 



To obtain 8-bit Rec. 601 Y'C B C R from R'G'B' ranging 
0 to 1, scale the rows of the matrix in Equation 25.3 by 
the factors 219, 224, and 224, corresponding to the 
excursions of each of Y', C B , and C R , respectively: 



601y, 

219 r 




' 16' 




65.481 


128.553 


24.966' 




>" 


6 B 


= 


128 
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-37.797 


-74.203 


112 


• 


C 


6r 




128 




112 


-93.786 


-18.214 




B' 



Summing the top row of this matrix yields 219, the 
luma excursion. The lower two rows sum to zero. The 
two entries of 112 reflect the positive C B and C R 
extrema, at the blue and red primaries. 



To recover R'G'B' in the range [0...+1] from 8-bit 
Rec. 601 V"C B C R , invert Equation 25.6: 









0.00456621 


0 


0.00625893 




f 


"60V 

219' 




" 16 




Eq 25.7 


C 


= 


0.00456621 


-0.00153396 


-0.00318811 


• 






- 


128 






B' 




0.00456621 


0.00791071 


0 




V 






128 





You can determine the excursion 
that an encoding matrix is 
designed to produce - often 1, 
219, 255, or 256 - by summing 
the coefficients in the top row. In 
Equation 25.8, the sum is 256. If 
you find an unexpected sum, 
suspect an error in the matrix. 



For implementation in binary arithmetic, the multiplica- 
tion by V 256 can be accomplished by shifting. The 
entries of 256 in this matrix indicate that the corre- 
sponding component can simply be added; there is no 
need for a multiplication operation. This matrix 
contains entries larger than 256; the corresponding 
multipliers will need capability for more than 8 bits. 



When rounding the matrix coefficients, take care to 
preserve the intended row sums, in this case, [1, 0, 0]. 
You must take care to prevent overflow due to roundoff 
error or other conditions: Use saturating arithmetic. 
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Eq 25.8 



Eq 25.9 



Eq 25.10 



Eq 25.11 



CHAPTER 



At the interface, after adding the offsets, clip all three 
components to the range 1 through 254 inclusive, to 
avoid the prohibited codes 0 and 255. 

Y'C B C R from studio RGB 

In studio equipment, 8-bit R'G'B' components usually 
have the same 219 excursion as the luma component of 
V"C B C R . To encode 8-bit Rec. 601 V"C B C R from R'G'B 1 
in the range [0...219], scale the encoding matrix of 
Equation 25.6 by 256 / 2 i 9 : 



601w, 

219' 




16" 
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256 
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150.272 


29.184" 




219®' 


C B 
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128 


-44.182 


-86.740 


130.922 
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219C' 


Cr 




128 




130.922 


-109.631 


-21.291 




2-198' 



To decode to R'G'B' in the range [0...219] from 8-bit 
Rec. 601 V"C B C R , invert Equation 25.8: 



219 R ' 
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”256 


‘256 
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reovl 

219' 




" 16" 
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219C 


256 
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-178.738 
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- 


128 
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256 
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128 
















V 






) 



These transforms assume that the R'G'B' components 
incorporate gamma correction, such as that specified by 
Rec. 709; see page 276. 



Y'C b C r from computer RGB 

In computing it is conventional to use 8-bit R'G'B’ 
components, with no headroom and no footroom: 
Black is at code 0 and white is at 255. To encode 8-bit 
Rec. 601 V"C B C R from R'G'B’ in this range, scale the 
matrix of Equation 25.6 by 256 / 255 : 



60 V" 
219' 
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256 
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To decode R'G'B' in the range [0...255] from 8-bit 
Rec. 601 V"C B C R , use the transform of Equation 25.11: 
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Rec. 601 V"C B C R uses the extremes of the coding range 
to handle signal overshoot and undershoot. Clipping is 
required when decoding to an R'G'B' range that has no 
headroom orfootroom. 



(clipped)- 

+128 '/////////////// fro . 

+127 



0 



-128 ■" 

0 

Figure 25.4 C B C R "full range" 
quantizer is used in JPEG/JFIF. 
Code +128 is clipped. 



"Full-range" V"C B C R 

The V"C B C R coding used in JPEG/JFIF stillframes in 
computing conventionally has no footroom or head- 
room. Luma (V") is scaled to an excursion of 255 and 
represented in 8 bits: Black is at code 0 and white is at 
code 255. Obviously, luma codes 0 and 255 are not 
prohibited ! Color difference components are scaled to 
an excursion of ±128, so each color difference compo- 
nent nominally has an excursion 256 /255 that of luma. 
However, a mid-tread quantizer necessarily uses an odd 
number of codes; to represent integers ranging ±128 
takes 257 code values. In JPEG/JFIF, 8-bit codes are 
used; neither C B code +128 (for example, at fully satu- 
rated blue) nor C R code +128 (for example, at fully 
saturated red) can be exactly represented. Figure 25.4 
shows the transfer function of the color difference 
quantizer, emphasizing that code +128 (pure blue, or 
pure red) is clipped. 



Figure 25.5 at the top of the facing page shows the full- 
range C B C R color difference plane. 



To encode from R'G'B' in the range [0...255] into 8-bit 
V"C B C R , with luma in the range [0...255] and C B and C R 
each ranging ±128, use the transform in Equation 25.12: 
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Eq 25.12 
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To decode into R'G'B' in the range [0...255] from full- 
range 8-bit Y"C b C r , use the transform in Equation 25.13: 
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Eq 25.13 


255 C ' 
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-87.755 
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0 
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Figure 25.5 C B C R "full 
range" components used 
in JPEG/JFIF are shown, 
ranging from -1 28 to 
+127. Chroma code +128 
is clipped so that fully 
saturated blue and red 
cannot be preserved. No 
provision is made for 
undershoot or over- 
shoot. The accompa- 
nying luma signal ranges 
0 through 255. 




Y'UV, Y'lQ confusion 

I have detailed Y'P B P R and V"C B C R . These are both based 
on [B’-Y', R'-Y 1 ] components, but they have different 
scale factors suitable for component analog and compo- 
nent digital interface, respectively. 

In NTSC and PAL chroma modulation, on page 335, I will 
describe [U, l/] and [/, Q] color differences. These 
components are also based on B’-Y' and R'-Y' , but 
have yet another set of scale factors. UV scaling - or IQ 
scaling and rotation - is appropriate only when the 
signals are destined for composite encoding, as in NTSC 
or PAL. 

Unfortunately, the notation Y'UV - or worse, YUV - is 
sometimes loosely applied to any form of color differ- 
ence coding based on [B’-Y', R'-Y']. Do not be misled 
by video equipment having connectors labelled Y'UV or 
Y', B'-Y', R'-Y', or these symbols without primes, or by 
JPEG being described as utilizing Y'UV coding. In fact 
the analog connectors convey signals with Y'P B P R 
scaling, and the JPEG standard itself specifies what 
I would denote 255 V"C B C R . 
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When the term Y'UV (or YUV) is encountered in 
a computer graphics or image-processing context, 
usually Rec. 601 V"C B C R is meant, but beware! 

Any image data supposedly coded to the original 1953 
NTSC primaries is suspect, because it has been about 
four decades since any equipment using these prima- 
ries has been built. 

Generally no mention is made of the transfer function 
of the underlying R'G'B' components, and no account is 
taken of the nonlinear formation of luma. 

When the term Y'lQ (or YIQ) is encountered, beware! 

Image data supposedly coded in Y'lQ is suspect since 
no analog or digital interface for Y'lQ components has 
ever been standardized. 

Nearly all NTSC encoders and decoders built since 1970 
have been based upon Y'UV components, not Y'lQ. 
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Component video 

color coding for HDTV 26 



In the previous chapter, Component video color coding 
for SDTV, I detailed various component color coding 
systems that use the luma coefficients specified in 
Rec. 601. Unfortunately, for no good technical reason, 
Rec. 709 for HDTV standardizes different luma coeffi- 
cients. Deployment of HDTV requires upconversion and 
downconversion capabilities both at the studio and at 
consumers' premises; this situation will persist for a few 
decades. Owing to this aspect of conversion between 
HDTV and SDTV, if you want to be an HDTV expert, 
you have to be an SDTV expert as well! 

Today's computer imaging systems - for still frames, 
desktop video, and other applications - use the 
Rec. 601 parameters, independent of the image's pixel 
count ("resolution independence"). As I write, it isn't 
clear whether Rec. 601 or Rec. 709 coding will be used 
when computer systems start performing HDTV editing. 
To me, it is sensible to retain the Rec. 601 coefficients. 

In this chapter, I assume that you're familiar with the 
concepts of Luma and color differences, described on 
page 281 . I will detail these component sets: 

• B'-Y', R'-Y' components, the basis for P B P R and C B C R 

• P B P R components, used for analog interface 

• C B C R components, used for digital interface 
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B'-Y', R'-Y' components for Rec. 709 HDTV 

The B'-Y' component reaches its positive maximum at 
blue (R' = 0, G' = 0, B'= 1). With Rec. 709 luma coeffi- 
cients, the maximum of B’-Y' = +0.9278 occurs at 
Y' = 0.0722. The B'-Y' component reaches its negative 
maximum at yellow (B'-Y' = -0.9278). Analogously, the 
extrema of R'-Y' occur at red and cyan at values 
±0.7874. (See Figure 26.1 above). These are inconve- 
nient values for both digital and analog systems. The 
709y'P b p r and 709 V"C B C R systems to be described both 
employ versions of [Y\ B’-Y', R'-Y '] that are scaled to 
place the extrema of the component values at more 
convenient values. 

To obtain [V", B'-Y', R'-Y'], from R'G'B', for Rec. 709 
luma coefficients, use this matrix equation: 



70 9y 




0.2126 


0.7152 


0.0722 




>’ 


B‘- 7 °V 
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- 0.2126 


- 0.7152 
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R'- 7 °V 




0.7874 


- 0.7152 


- 0.0722 




B' 



P B P R components for Rec. 709 HDTV 

If two color difference components are to be formed 
having excursions identical to luma, then P B and P R 
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Figure 26.2 P B P R compo- 
nents for Rec. 709 HDTV 




Eq 26.2 



Eq 26.3 



color difference components are used. For Rec. 709 
luma, the equations are these: 



709 p 



709 p 
'R 



0.5 

1-0.0722 

0.5 

1-0.2126 




1 

1.8556 

1 

1.5748 




= 0.5389^e'- 70 Vj 
= 0.6350(V- 70 Vj 



These scale factors limit the excursion of each color 
difference component to the range -0.5 to +0.5 with 
respect to unity luma excursion: 0.0722 in the first 
expression above is the luma coefficient of blue, and 
0.2126 in the second is for red. At an HDTV analog 
interface, luma ranges from 0 mV (black) to 700 mV 
(white), and P B and P R analog components range 
±350 mV. Figure 26.2 above shows a plot of the 
[P B , P R ] plane. 



Expressed in matrix form, the B'-Y' and R'-Y' rows of 
Equation 26.1 are scaled by 0 5 /o .9278 and °- 5 /o. 7874 - 
To encode from R'G'B’ where reference black is zero 
and reference white is unity: 
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Figure 26.3 C B C R compo- 
nents for Rec. 709 
HDTV are shown in their 
mathematical form. At an 
8-bit interface, an offset 
of +128 is added to each 
color difference compo- 
nent. 




The inverse, decoding matrix is this: 



R' 




"l 


0 


1.5748 




709yi 


G 


= 


1 
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C B C R components for Rec. 709 HDTV 

709 V"C B C R coding is used in component digital HDTV 
equipment. In 8-bit systems, luma has an excursion of 
219. Color differences C B and C R are coded in 8-bit 
offset binary form, with excursions of ±1 12. The 
[C B , C R ] plane of HDTV is plotted in Figure 26.3. 

In 8-bit systems, a luma offset of +16 is added at the 
interface, placing black at code 16 and white at code 
235; an offset of +128 is added to C B and C R , yielding 
a range of 16 through 240 inclusive. (Following the 
convention of the previous chapter, in the equations to 
follow I write the offset terms in gray.) HDTV standards 
provide for 10-bit components, and 10-bit studio video 
equipment is commonplace. In a 10-bit interface, the 
8-bit interface levels and prohibited codes are main- 
tained; the extra two bits are appended as least-signifi- 
cant bits to provide increased precision. 
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To form 709 Y"C B C R from [V", B'-Y', R'-Y'] components in 
the range [0...+1], use these equations: 



Eq 26.5 
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To obtain 709 Y"C B C R from R'G'B' ranging 0 to 1, scale 
the rows of the matrix in Equation 26.3 by the factors 
[219, 224, 224], corresponding to the excursions of 
each of the components: 
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Eq 26.6 
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Summing the first row of the matrix yields 219, the 
luma excursion from black to white. The two entries of 
112 reflect the positive C B C R extrema at blue and red. 



To recover R'G'B' in the range [0...+1] from 709 V"C B C R , 
use the inverse of Equation 26.6: 



~R'~ 




"0.00456621 
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0.00703036 
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'709 v ," 

219' 




" 16' 
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0.00456621 


-0.00083627 


-0.00208984 


• 




C B 


- 


128 




B' 




0.00456621 


0.00828393 


0 




V 






128 


> 



The 709 Y"C b C r components are integers in 8 bits; recon- 
structed R'G'B’ is scaled to the range [0...+1]. 



Figure 24.2 (on page 285) illustrated that when R'G'B' 
components are transformed to luma and color differ- 
ences, the unit R'G'B’ cube occupies only a small frac- 
tion of the volume of the enclosing cube. In digital 
video, only about V4 of V"C B C R codewords correspond 
to R'G'B' values between zero and unity. Certain signal- 
processing operations (such as filtering) may produce 
Y'C b C r codewords that lie outside the RGB - legal cube. 
These codewords cause no difficulty in the V"C B C R 
domain, but potentially present a problem when 
decoded to R'G'B'. Generally, R'G'B' values are clipped 
between 0 and 1. 
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C B C R components for Rec. 1361 HDTV 

One method of extending the color gamut of an R'C'B' 
system is to allow components to excurse below zero 
and above unity. In Rec. 1361 transfer function, on 
page 265, I explained one approach. Rec. 1361 is based 
upon Rec. 709 primaries, but enables the RCB tristim- 
ulus components to excurse from -V4 to +% . 

When transformed to Rec. 709 V"C B C R , all of the real 
surface colors documented by Pointer -that is, all the 
_ . o * ,, colors in Pointer's gamut - produce values that are 

Concerning Pointer, see the ° ~ 

marginal note on page 255. /'C B C R -valid. Though Rec. 1361 was needed to specify 

the R'G'B' representation of wide-gamut colors, no 
special provisions are necessary to carry those colors 
across a 709 V"C B C R interface. The notation "Rec. 1361 
V"C B C R ," or as I write it, 1361 Y"C B C R , makes it explicit 
that codewords outside the unit R'G'B' cube are to be 
interpreted as wide-gamut colors, instead of being 
treated as RGB-illegal. 

Equipment conforming to Rec. 1361 is not yet 
deployed, and is not anticipated for several years. 
Wide-gamut acquisition and production equipment will 
begin to replace film over the next decade or so; 
however, wide-gamut consumer displays are not 
expected in that time frame. When these begin to be 
deployed, it is unlikely that they will all have the same 
gamut; electronics associated with each display will 
have to process the color signals according to the prop- 
erties of each display. In the longterm, gamut mapping 
strategies comparable to those in the desktop color 
management community will have to be deployed. 

Y'C b C r from studio RGB 

In studio equipment, 8-bit R'G'B' components usually 
use the same 219 excursion as the luma component of 
V"C B C R . To encode V"C B C R from R'G'B' in the range 
[0...219] using 8-bit binary arithmetic, scale the 
encoding matrix of Equation 26.6 by 256 / 2 ig: 
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54.426 
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Eq 26.8 
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Cr 




128 




130.922 


-118.918 


-12.005 




219 B' 
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To decode to R'G'B' in the range [0...219] from T'C B C R 
using 8-bit binary arithmetic: 
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Y'C b C r from computer RGB 

In computing it is conventional to use 8-bit R'G'B’ 
components, with no headroom orfootroom: Black is 
at code 0 and white is at 255. To encode V"C B C R from 
R'G'B' in the range [0...255] using 8-bit binary arith- 
metic, the matrix of Equation 26.6 is scaled by 256 / 255 : 





709 y, 
219' 
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1 

256 


65.738 129.057 25.064 




255 R' 


Eq 26.10 


C B 


= 


128 


-37.945 -74.494 112.439 
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255 C ' 




C R 




128 




112.439 -94.154 -18.285 




255®" 



To decode R'G'B' in the range [0...255] from Rec. 601 
rC B C R using 8-bit binary arithmetic: 
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Eq 26.11 
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Conversions between HDTV and SDTV 

The differences among the EBU, SMPTE, and Rec. 709 
primaries are negligible for practical purposes. New 
equipment should be designed to Rec. 709. Also, SDTV 
and HDTV have effectively converged to the transfer 
function specified in Rec. 709. Consequently, R'G'B' 
coding uses essentially identical parameters worldwide, 
for SDTV and HDTV. (The sRGB standard for desktop 
computing uses the primaries of Rec. 709, but uses 
a different transfer function.) 

Unfortunately, as I have mentioned, the luma coeffi- 
cients differ dramatically between SDTV and HDTV. This 
wouldn't matter if HDTV systems were isolated! 
However, in practice, SDTV is upconverted and HDTV is 
downconverted, both at the studio and at consumers' 
premises. Serious color reproduction errors arise if 
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differences among luma coefficients are not taken into 
account in conversions. 



In principle, downconversion can be accomplished by 
decoding 709 Y"C B C R to R'G'B' using a suitable 3x3 
matrix (such as that in Equation 26.7, on page 317), 
then encoding R'G'B’ to 601 V"C B C R using another 3x3 
matrix (such as that in Equation 25.6, on page 308). 
The two 3x3 matrices can be combined so that the 
conversion can take place in one step: 



601 v/ i 
219 r 




"l 


0.101579 


0.196076 




709y," 

219 r 




= 


0 


0.989854 


-0.110653 


• 




Cr 




0 


-0.072453 


-0.983398 







In the first row of the matrix, the coefficient 0.101579 
adds about one tenth of Rec. 709's C B into Rec. 601 's 
luma, as consequence of Rec. 709's blue luma coeffi- 
cient being just 0.0722, compared to 0.1 14 for 
Rec. 601. The coefficient 0.196076 adds about one fifth 
of Rec. 709's C R into Rec. 601 's luma, as consequence 
of Rec. 709's red luma coefficient being 0.2126, 
compared to 0.299 for Rec. 601. Clearly, failure to 
perform this color transform produces large color errors. 



To convert from SD to HD, the matrix of Equation 26.12 
is inverted: 
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601yi _ 

219' 
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1.018640 
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c B 


Cr 
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0.075049 


1.025327 




Cr 



Equations 26.12 and 26.13 are 
written without interface offsets of 
+16 for luma and +128 for of C B 
and C B : If these are present, remove 
them, transform, and reapply them. 



Unfortunately, to upconvert or downconvert 
a subsampled representation such as 4:2:2 or 4:2:0 
requires chroma interpolation, color transformation, 
then chroma subsampling. This is computationally 
intensive. 



SMPTE 240AA-1988 luma 

The coding systems that I have described are based 
upon the luma coefficients of Rec. 709. Before Rec. 709 
For details Of SMPTE 240M-1 988 was established, SMPTE 240M-1988 for 1035/30 

luma, see page 292. HDTV established luma coefficients based upon the 

SMPTE RP 145 primaries. In 1990, international 
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agreement was reached in Rec. 709 on a new transfer 
function, a new set of primaries, and new luma coeffi- 
cients. SMPTE 274M for 1920x1080 HDTV adopted 
the Rec. 709 parameters, but made provisions for the 
SMPTE 240M parameters as the "interim implementa- 
tion." Much equipment has been deployed using the 
240 M parameters. The most recent revision of 
SMPTE 274M eliminates the provision for the "interim 
implementation," and specifies Rec. 709 parameters. 

I recommend that you use the Rec. 709 parameters for 
new equipment. However, if your equipment must 
interoperate with the "interim implementation," or with 
SDTV equipment, you must pay very careful attention 
to conversion. Although the differences in transfer func- 
tions and primary sets are evident in test signals, they 
are negligible for actual pictures. However, the differ- 
ences among luma coefficients are significant. 



To convert from legacy SMPTE 240M V"C B C R compo- 
nents to Rec. 709 V"C B C R , use this transform: 
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-0.003437 
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1.014800 







ITU-R Rec. BT.709, Basic param- 
eter values for the HDTV standard 
for the studio and for international 
programme exchange. 

SMPTE 274M, 7 920 x7 080 Scan- 
ning and Analog and Parallel Digital 
Interfaces for Multiple Picture Rates. 

SMPTE 296 M, 1280x720 Progres- 
sive Image Sample Structure - 
Analog and Digital Representation 
and Analog Interface. 



Color coding standards 

ITU-R Rec. BT.709 defines V"P B P R for component analog 
HDTV and V"C B C R for component digital HDTV. 

The parameters of V'P B P R and V"C B C R forthe 1280x720 
and 1920x1080 systems are defined by the SMPTE 
standards cited in the margin. 
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Video signal processing 



27 



This chapter presents several diverse topics concerning 
the representation and processing of video signals. 




o i 2 c 

Figure 27.1 Transition 
samples. The solid line, dots 
(•), and light shading show the 
luma transition; the dashed line, 
open circles (o), and heavy 
shading show 4:2:2 chroma 



Transition samples 

In Scanning parameters, on page 54, I mentioned that it 
is necessary to avoid an instantaneous transition from 
blanking to picture at the start of a line. It is also neces- 
sary to avoid an instantaneous transition from picture to 
blanking at the end of a line. In studio video, the first 
and the last few active video samples on a line are 
blanking transition samples. I recommend that the first 
luma (V") sample of a line be black, and that this sample 
be followed by three transition samples clipped to 10%, 
50%, and 90% of the full signal amplitude. In 4:2:2, 

I recommend that the first three color difference (C) 
samples on a line be transition samples, clipped to 
10%, 50%, and 90%. Figure 27.1 sketches the transi- 
tion samples. The transition values should be applied by 
clipping, rather than by multiplication, to avoid 
disturbing the transition samples of a signal that already 
has a proper blanking transition. 



Edge treatment 

If an image row of 720 samples is to be processed 
through a 25-tap FIR filter (such as that of Figure 16.26, 
on page 167) to produce 720 output samples, the 
calculations for 12 output samples at each end of the 
line will refer to input samples outside the image. One 
approach to this problem is to produce just those 
output samples - 696 in this example - that can be 
computed from the available input samples. Flowever, 
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See Scanning parameters, on 
page 54, and Transition 
samples, on page 323. 



In 480 i line assignment, on 
page 500, I will detail how 480/ 
studio standards provide up to 
487 picture lines. In 576/ line 
assignment, on page 520, I detail 
how 576/ studio standards 
provide 574 full lines and two 
halflines. 
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filtering operations are frequently cascaded, particu- 
larly in the studio, and it is unacceptable to repeatedly 
narrow the image width upon application of a sequence 
of FIR filters. A strategy is necessary to deal with 
filtering at the edges of the image. 

Some digital image-processing textbooks advocate 
considering the area outside the pixel array to contain 
replicated edge samples. I consider this to be quite 
unrealistic, because a small feature that happens to lie 
at the edge of the image will exert undue influence into 
the interior of the pixel array. Other digital image- 
processing textbooks consider the image to wrap in 
a cylinder: Missing samples outside the left-hand edge 
of the image are copied from the right-hand edge of the 
image! This concept draws from Fourier transform 
theory, where a finite data set is treated as being cyclic. 
In practice, I consider the wrapping strategy to be even 
worse than edge replication. 

In video, we treat the image as lying on a field of black: 
Unavailable samples are taken to be zero. With this 
strategy, repeated lowpass filtering causes the implicit 
black background to intrude to some extent into the 
image. In practice, few problems are caused by this 
intrusion. Video image data nearly always includes 
some black (or blanking) samples, as I outlined in the 
discussion of samples per picture width and samples 
per active line. In studio standards, a region lying within 
the pixel array is designated as the clean aperture, as 
sketched in Figure 6.4, on page 55. This region is 
supposed to remain subjectively free from artifacts that 
originate from filtering at the picture edges. 

Picture lines 

H istorically, the count of picture lines (image rows, t A ) 
has been poorly standardized in 480/ systems. Various 
standards have specified between 480 and 487 picture 
lines. It is pointless to carry picture on line 21/284 or 
earlier, because in NTSC transmission this line is 
reserved for closed caption data: 482 full lines, plus the 
bottom halfline, now suffice. With 4:2:0 chroma 
subsampling, as used in JPEG, MPEG-1, and MPEG-2, 
a multiple of 16 picture lines is required. MPEG 
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Active lines (vertically) encompass 
the picture height. Active samples 
(horizontally) encompass not only 
the picture width, but also up to 
about a dozen blanking transition 
samples. 



compression is now so important that a count of 
480 lines has become de rigeur for 525-line MPEG 
video. In 5 76/ scanning, a rigid standard of 576 picture 
lines has always been enforced; fortuitously for MPEG 
in 576/, this is a multiple of 16. 

MPEG-2 accommodates the 1920x1080 image format, 
but 1080 is not a multiple of 16. In MPEG-2 coding, 
eight black lines are appended to the bottom of each 
1920x1080 picture, to form a 1920x1088 array that is 
coded. The extra 8 lines are discarded upon decoding. 

Traditionally, the image array of 480/ and 576/ systems 
had halflines, as sketched in Figures 11.3 and 11.4 on 
page 98: Halfline blanking was imposed on picture 
information on the top and bottom lines of each frame. 
Neither JPEG nor MPEG provides halfline blanking: 
When halfline-blanked image data is presented to 
a JPEG or MPEG compressor, the blank image data is 
compressed. Halflines have been abolished from HDTV. 

Studio video standards have no transition samples on 
the vertical axis: An instantaneous transition from 
vertical blanking to full picture is implied. However, 
nonpicture vertical interval information coded like 
video - such as VITS or VITC - may precede the picture 
lines in a field or frame. Active lines comprise only 
picture lines (and exceptionally, in 480/ systems, closed 
caption data). L A excludes vertical interval lines. 

Computer monitor interface standards, such as those 
from VESA, make no provision for nonpicture (vertical 
interval) lines other than blanking. 

Choice of S AL and S PW parameters 

In Scanning parameters, on page 54, I characterized two 
video signal parameters, samples per active line (S AL ) 
and samples per picture width (S PW )- Active sample 
counts in studio standards have been chosen for the 
convenience of system design; within a given scanning 
standard, active sample counts standardized for 
different sampling frequencies are not exactly propor- 
tional to the sampling frequencies. 
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HDTV standards specify that the 
50%-points of picture width must 
lie no further than 6 samples 
inside the production aperture. 



720 S AL accommodates 714 S PW , 
corresponding to 10.7 ps of 
analog blanking in the studio, 
conformant to SMPTE 170 M. 

704 S PW corresponds to 1 1. 4 ps 
of analog blanking, outside 
SMPTE 170M. 



In 480/ systems with setup, 
picture excursion refers to the 
range from blanking to white, 
even though strictly speaking the 
lowest level of the picture signal 
is 7.5 IRE, notO IRE. 



Historically, "blanking width" was measured instead of 
picture width. Through the decades, there has been 
considerable variation in blanking width of studio stan- 
dards and broadcast standards. Also, blanking width 
was measured at levels other than 50%, leading to an 
unfortunate dependency upon frequency response. 

Most modern video standards do not specify picture 
width: It is implicit that the picture should be as wide 
as possible within the production aperture, subject to 
reasonable blanking transitions. Figure 11.1, on page 96 
indicates S AL values typical of studio practice. 

For digital terrestrial broadcasting of 480/ and 480 p, 
the ATSC considered the coding of transition samples to 
be wasteful. Instead of specifying 720 S AL , ATSC estab- 
lished 704 S AL . This created an inconsistency between 
production standards and broadcast standards: MPEG-2 
macroblocks are misaligned between the two. 

Computer monitor interface standards, such as those 
from VESA, do not accommodate blanking transition 
samples. In these standards, S PW and S AL are equal. 

Video levels 

I introduced 8-bit studio video levels on page 22. 
Studio video coding provides headroom and footroom. 
At an 8-bit interface, luma has reference black at 
code 16 and reference white at code 235; color differ- 
ences are coded in offset binary, with zero at code 128, 
the negative reference at code 16, and the positive 
reference at code 240. (It is a nuisance that the posi- 
tive reference levels differ between luma and chroma.) 

I use the term reference instead of peak: the peaks of 
transient excursions may lie outside the reference 
levels. All studio interfaces today accommodate 10-bit 
signals, and most equipment today implements 10 bits. 
In 10-bit systems, the reference levels just mentioned 
are multiplied by 4; the two LSBs add precision. 

Video levels in 480/ systems are expressed in IRE units, 
sometimes simply called units. IRE refers to the Insti- 
tute of Radio Engineers in the United States, the prede- 
cessor of the IEEE. Reference blanking level is defined 
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Back porch is described in 
Analog horizontal blanking 
interval, on page 405. 



as 0 IRE; reference white level is 100 IRE. The range 
between these values is the picture excursion. 

In the analog domain, sync is coded at voltage level 
more negative than black; sync is "blacker than black!' 
The ratio of picture excursion to sync amplitude is the 
pictureisync ratio. Two different ratios are standard: 
10:4 is predominant in 480/ and computing; 7:3 is 
universal in 5 76/ and HDTV, occasionally used in 480/', 
and rarely used in computing. 

Setup (pedestal) 

In 480/ composite NTSC video in North America, refer- 
ence black is offset above blanking by 7.5% ( 3 / 40 ) of the 
picture excursion. Setup refers to this offset, expressed 
as a fraction or percentage of the picture excursion. In 
a 480/ system with setup, there are nominally 92.5 IRE 
units from black to white. 

Blanking level at an analog interface is established by 
a back porch clamp. However, in a system with setup, 
no signal element is present that enables a receiver to 
accurately recover black level. If an interface has poor 
tolerance, calibration error, or drift, setup causes prob- 
lems in maintaining accurate black-level reproduction. 
Consequently, setup has been abolished from modern 
video systems: Zero setup is a feature of EBU N10 
component video, all variants of 576/ video, and HDTV. 
In all of these systems, blanking level also serves as the 
reference level for black. 

480/ video in Japan originally used setup. However, in 
about 1985, zero setup was adopted; 10:4 picture-to- 
sync ratio was retained. Consequently, there are now 
three level standards for analog video interface. 

Figure 27.2 overleaf shows these variations. 

The archaic EIA RS-343-A standard specifies mono- 
chrome operation, 2:1 interlace with 60.00 Hz field 
rate, 7 ps horizontal blanking, and other parameters 
that have no place in modern video systems. Unfortu- 
nately, Most PC graphics display standards have inher- 
ited RS-343-A's 10:4 picture-to-sync ratio and 7.5% 
setup. (Some high-end workstations have zero setup.) 
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Voltage, mV IRE units Voltage, mV 




7.5% setup Zero setup Zero setup 

10:4 picture:sync 10:4 picture:sync 7:3 picture:sync 



Figure 27.2 Comparison of 7.5% and zero setup. The left-hand third shows the video levels of 
composite 480/ video, with 7.5% setup and 10:4 picture-to-sync ratio. This coding is used in 
some studio equipment and in most computer monitor interfaces. The middle third shows zero 
setup and 10:4 picture-to-sync, as used in 480/ video in Japan. EBU N10 component video, 576/ 
systems, and HDTV use zero setup, 700 mV picture, and 300 mV sync, as shown at the right. 



The term pedestal refers to the absolute value of the 
offset from blanking level to black level, in IRE units or 
millivolts: Composite 480/ NTSC incorporates a ped- 
estal of 7.5 IRE. Pedestal includes any deliberate offset 
added to R', G', or B’ components, to luma, or to 
a composite video signal, to achieve a desired technical 
or aesthetic intent. In Europe, this is termed lift. 

(I prefer the term black level to either pedestal or lift.) 

Rec. 601 to computing 

The coding difference between computer graphics and 
studio video necessitates image data conversion at the 
interface. Figure 27.3 opposite shows the transfer func- 
tion that converts 8-bit Rec. 601 studio R'G'B' into 
computer R'G'B'. The footroom and headroom regions 
of Rec. 601 are clipped, and the output signal omits 36 
code values. This coding difference between computer 
graphics and studio video is one of many challenges in 
taking studio video into the computer domain. 
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Figure 27.3 8-bit Rec. 601 to full-range (computer) R'C'B' conversion involves multiplying 
by a scale factor of 255 / 2 i 9 , to account for the difference in range. This causes the footroom 
and headroom regions of the studio video signal to be clipped, and causes the output signal 
to be missing several code values. The detail shows the situation at mid-scale; the transfer 
function is symmetrically disposed around input pair [109, 110] and output pair [127, 128], 
This graph shows a linear relationship from black to white. The linear relationship is suitable 
in computer systems where a ramp is loaded into the lookup table (LUT) between the frame- 
buffer and the display; in that case, R'C'B' data is displayed on the computer monitor 
comparably to the way R'C'B' is displayed in video; see Camma, on page 257. 
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Enhancement 



The sharpness control in 
consumer receivers effects hori- 
zontal “enhancement" on the 
luma signal. 
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This section and several subsequent sections discuss 
enhancement, median filtering, coring, chroma transi- 
tion improvement (CTI), and scan-velocity modulation 
(SVM). Each of these operations superficially resembles 
FIR filtering: A "window" involving a small set of neigh- 
boring samples slides over the input data. For each new 
input sample, the filtering operation delivers one 
output sample that has been subject to some fixed time 
delay with respect to the input. Unlike FIR filtering, 
with the exception of the most benign forms of 
enhancement these operations are nonlinear. They 
cannot, in general, be undone. 

The term "enhancement" is widely used in image 
processing and video. It has no precise meaning. 
Evidently, the goal of enhancement is to improve, in 
some sense, the quality of an image. In principle, this 
can be done only with knowledge of the process or 
processes that degraded the image's quality. In prac- 
tice, it is extremely rare to have access to any history of 
the processes to which image data has been subject, so 
no systematic approach to enhancement is possible. 

In some applications, it may be known that image data 
has been subject to processes that have introduced 
specific degradations or artifacts. In these cases, 
enhancement may refer to techniques designed to 
reduce these degradations. A common example 
involves degraded frequency response due to aperture 
effects. Enhancement in this case, also known as aper- 
ture correction, is accomplished by some degree of high- 
pass filtering, either in the horizontal direction, the 
vertical direction, or both. Compensation of loss of 
optical MTF should be done in the linear-light domain; 
however, it is sometimes done in the gamma-corrected 
domain. Historically, vertical aperture correction in 
interlaced tube cameras (vidicons and plumbicons) was 
done in the interlaced domain. 

More generally, enhancement is liable to involve 
nonlinear processes that are based on some assump- 
tions about the properties of the image data. Unless 
signal flow is extremely well controlled, there is a huge 
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danger in using such operations: Upon receiving image 
data that has not been subject to the expected process, 
"enhancement" is liable to degrade the image, rather 
than improve it. For this reason, I am generally very 
strongly opposed to "enhancement." 

Median filtering 

In a median filter, each output sample is computed as 
the median value of the input samples under the 
window. Ordinarily, an odd number of taps are used; 
the median is the central value when the input samples 
are sorted by value. Median filtering usually involves 
a horizontal window with 3 taps. Occasionally, 5 taps 
are used; rarely, 3x3 spatial median filters are used. 

Any isolated extreme value, such as a large-valued 
sample due to impulse noise, will not appear in the 
output sequence of a median filter: Median filtering can 
be useful to reduce noise. Flowever, a legitimate 
extreme value will not be included either! I urge you to 
use great caution in imposing median filtering: If your 
filter is presented with image data whose statistics are 
not what you expect, you are very likely to degrade the 
image instead of improving it. 

Coring 

Coring assumes that any low-magnitude, high-frequency 
signal components are noise. The input signal is sepa- 
rated into low- and high-frequency components using 
complementary filters. The low-frequency component is 
passed to the output. The magnitude of the high- 
frequency component is estimated, and the magnitude 
is subject to a thresholding operation. If the magnitude 
is below threshold, then the high-frequency compo- 
nent is discarded; otherwise, it is passed to the output 
through summation with the low-frequency compo- 
nent. Coring can be implemented by the block diagram 
shown in Figure 27.4 overleaf. 

Like median filtering, coring depends upon the statis- 
tical properties of the image data. If the image is aflat- 
shaded cartoon having large areas of uniform color with 
rapid transitions between them, then coring will elimi- 
nate noise below a certain magnitude. Flowever, if the 
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LOWPASS FILTER 



LUMA 
or R'G'B' 



Figure 27.4 Coring block 
diagram includes comple- 
mentary filters that separate 
low- and high-frequency 
components. The high- 
frequency components are 
processed by the nonlinear 
transfer function in the sketch. 




input is not a cartoon, you run the risk that coring will 
cause it to look like one! In a close-up of a face, skin 
texture produces a low-magnitude, high-frequency 
component that is not noise. If coring eliminates this 
component, the face will take on the texture of plastic. 

Coring is liable to introduce spatial artifacts into an 
image. Consider an image containing a Persian carpet 
that recedes into the distance. The carpet's pattern will 
produce a fairly low spatial frequency in the foreground 
(at the bottom of the image); as the pattern recedes 
into the background, the spatial frequency of the 
pattern becomes higher and its magnitude becomes 
lower. If this image is subject to coring, beyond a cer- 
tain distance, coring will cause the pattern to vanish. 
The viewer will perceive a sudden transition from the 
pattern of the carpet to no pattern at all. The viewer 
may conclude that beyond a certain distance there is 
a different carpet, or no carpet at all. 

Chroma transition improvement (CTI) 

Color-under VCRs exhibit very poor color difference 
bandwidth (evidenced as poor chroma resolution in the 
horizontal direction). A localized change in luma may 
be faithfully reproduced, but the accompanying change 
in color difference components will be spread horizon- 
tally. If you assume that colored areas tend to be 
uniformly colored, one way of improving image quality 
is to detect localized changes in luma, and use that 
information to effect repositioning of color difference 
information. Techniques to accomplish this are collec- 
tively known as chroma transition improvement (CTI). 
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If you use CTI, you run the risk of introducing excessive 
emphasis on edges. Also, CTI operates only on the hori- 
zontal dimension: Excessive CTI is liable to become 
visible owing to perceptible (or even objectionable) 
differences between the horizontal and vertical charac- 
teristics of the image. CTI works well on cartoons, and 
on certain other types of images. However, it should be 
used cautiously. 

Scan-velocity modulation (SVM) 

CRT displays are subject to limitations of how fast the 
electron beam can change from one state to another 
(for example, from on to off). These limitations are 
imposed by limited bandwidth of the video amplifiers 
and by the electrical capacitance that the CRT's cathode 
or grid presents to the driving circuitry. These limita- 
tions are reflected as limited spatial resolution of the 
reproduced image: An edge is reproduced across a hori- 
zontal dimension wider than desired. 

One way to reduce the dimension of an edge involves 
making the electron beam position responsive to 
changes in beam intensity. If intensity is increasing 
rapidly as the beam goes left-to-right, the beam can be 
accelerated. If intensity is decreasing rapidly, the beam 
can be decelerated so that it dwells longer on areas of 
the screen that require high intensity. This process is 
scan-velocity modulation (SVM). Luma (or some compa- 
rable quantity) is processed through a high-pass filter; 
the result is amplified and applied to the horizontal 
deflection circuit of the CRT. 

Note that all of the red, green, and blue beams are 
deflected together: The technique is effective only on 
black-to-white or white-to-black transitions. In 
a magenta-to-green transition, both red and blue are 
negative-going, even though luma is increasing: In this 
example, SVM has unintended effects on red and blue. 

Mixing and keying 

Mixing video signals together to create a transition, or 
a layered effect - for example, to mix or wipe - is called 
compositing. In America, a piece of equipment that 
performs such effects is a production switcher. In 
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Porter, Thomas, and Tom Duff, 
"Compositing Digital Images," in 
Computer Graphics, 18 (3): 253- 
259 (July 1984, Proc. 

SIGGRAPH ’84). The terms 
composite and compositing are 
overused in video! 

SMPTE RP 157, Key Signals. 



Eq 27.1 



R = a ■ FG + (l - a) • BG 



The most difficult part of keying is 
extracting (“pulling") the matte. 
For review from a computer 
graphics perspective, see Smith, 
Alvy Ray, and James F. Blinn, 
"Blue Screen Matting," in 
Computer Graphics, Annual 
Conference Series 1996 (Proc. 
SIGGRAPH 96), 259-268. 



Europe, the equipment - or the person that operates 
it! - is called a vision mixer. 

Accomplishing mix, wipe, or key effects in hardware 
also requires synchronous video signals - that is, signals 
whose timing matches perfectly in the vertical and hori- 
zontal domains. (In composite video, it is also neces- 
sary to match subcarrier phase, as I will describe.) 

Keying (or compositing ) refers to superimposing a fore- 
ground (FG, or fill video) image over a background (BG) 
image. Keying is normally controlled by a key (or matte) 
signal, coded like luma, that indicates the transparency 
of the accompanying foreground image data, coded 
between black (fully transparent) and white (fully 
opaque). In computer graphics, the key signal (data) is 
called alpha (a), and the operation is called compositing. 

The keying (or compositing) operation is performed as 
in Equation 27.1. Foreground image data that has been 
premultiplied by the key is called shaped in video, or 
associated, integral, or premultiplied in computer 
graphics. Foreground image data that has not been 
premultiplied by the key is called unshaped in video, or 
unassociated or nonpremultiplied in computer graphics. 

The multiplication of foreground and background data 
in keying is equivalent to modulation: This can produce 
signal components above half the sampling rate, 
thereby producing alias components. Aliasing can be 
avoided by upsampling the foreground, background, 
and key signals; performing the keying operation at 
twice the video sampling rate; then suitably filtering 
and downsampling the result. Most keyers operate 
directly at the video sampling rate without upsampling 
or downsampling, and so exhibit some aliasing. 

To mimic optical compositing, keying should be 
performed in the linear-light domain. However, keying 
in video is usually done in the gamma-corrected 
domain. (A key signal in video is sometimes called 
linear key ; this does not refer to linear light, but to 
a key signal representing opacity with more than just 
the two levels fully transparent and fully opaque.) 
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NTSC and PAL 
chroma modulation 



28 



See Appendix A, YUV and luminance 
considered harmful, on page 595. 



In Introduction to composite NTSC and PAL, on 
page 103, I outlined composite NTSC and PAL color 
encoding. This chapter details how an encoder forms U 
and 1/ color difference components, how modulated 
chroma (C) is formed, and how a decoder demodulates 
back to U and l/. The following chapter, NTSC and PAL 
frequency interleaving, on page 349, explains how an 
encoder sums luma and chroma to form a composite 
NTSC or PAL signal, and how a decoder separates luma 
and modulated chroma prior to chroma demodulation. 

The designers of NTSC color television intended that 
chroma would be based upon / and Q components. 
Nowadays, / and Q components are essentially obso- 
lete, and U and 1/ components are generally used. For 
details, see NTSC Y'lQ system, on page 365. 

Y'UV coding is unique to composite NTSC and PAL: 

It is has no place in component video, HDTV, or 
computing. If chroma components are to be kept sepa- 
rate, it is incorrect to apply the U and 1/ scaling, or to 
use U and 1/ notation. Unfortunately, the Y'UV nota- 
tion - or, carelessly written, YUV - is often used nowa- 
days to denote any component system involving two 
scaled color difference components based upon B'-Y' 
and R'-Y' where the scaling is unknown or implicit. 

Even worse, the notation YUV is sometimes used: The 
unprimed / suggests luminance, but no YUV system 
uses linear-light luminance, and if luminance were actu- 
ally used, the UV scaling would be incorrect. 
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Figure 28.1 U V compo- 
nents are scaled so as to 
limit the composite (Y'+C) 
excursion. UV scaling is 
inappropriate if the compo- 
nents are kept separate. 

This sketch reflects the 
display of a colorbar test 
signal on an NTSC vector- 
scope; the colorbar signal 
is described on page 535. 

If the figure rotates with 
respect to burst, a chroma 
phase error has occurred; 
upon display, this will mani- 
fest itself as a hue error. 



V axis (cos) 




+0.5 U axis (sin) 



The exact values are these: 



_ 1_ 1 209556997 
B “ 3 \ 96146491 



k 



R “ 



221990474 

288439473 



Incidentally, the ratio k B '.k R is 
almost exactly 9:16. 



UV components 

In Component video color coding for SDTV, on page 301 , 
I outlined the formation of luma and color difference 
components. In encoding NTSC, PAL, or S-video, the 
color difference components are scaled so that the 
eventual composite signal is contained within the 
amplitude limits of VHF and UHF television transmit- 
ters. It is standard to limit the composite excursion to 
the range [-V3 ... +%] of unity luma excursion. To this 
end, the B'-Y' and R'-Y' components are scaled 
according to Equation 28.1 by factors k B and k R to form 
U and 1/ components depicted in Figure 28.1 . 

U = Ar B (fi'- 601 r); l/ = /c R (V- 60 V) Eq 28.1 

The scale factors are chosen to satisfy this constraint: 

--< 60 V'±V U 2 +V 2 <- Eq 28.2 

3 3 

I described B'-Y', R'-Y' components for SDTV, on 
page 303. The maximum excursions of the individual 
color difference components occur at the blue and red 
primaries. Since these points are not located on the 
axes where scaling takes place, the scale factors are 
derived from two simultaneous equations involving B'- 
Y' and R'-Y'. The scale factors were once standardized 
to three digits, rounded to 0.493 and 0.877; in the 
contemporary SMPTE 170 M standard, they are 
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expressed to 6 digits. B'-Y' and R'-Y' components are 
transformed to UV through these relations: 



Eq 28.3 



Eq 28.4 



u = o.492i 1 1 [ e- 60 V) = ^e'- 60 V') 

V = 0.877283 (fi'- 60 Vj « : ^-(V- 60 Vj 
The scaling can be expressed in matrix form: 



"601 y,' 




"l 


0 


0 




601 y , 


U 


= 


0 


0.492111 


0 


• 


S'- 60 V 


V 




0 


0 


0.877283 




/?'- 60 V 



To obtain Y'UV from R'G'B', concatenate the matrix of 
Equation 28.4 above with the matrix of Equation 25.1, 
on page 303: 





601y, 




0.299 


0.587 


0.114 




>" 


Eq 28.5 


U 


= 


-0.147141 


-0.288869 


0.436010 


• 


C 




V 




0.614975 


-0.514965 


-0.100010 




B' 



To recover R'G'B' from Y'UV, invert Equation 28.5: 



>" 




1 0 


1.139883" 




1 

O 


G 


= 


1 -0.394642 


-0.580622 


• 


U 


B' 




1 2.032062 


0 




V 



-233- =---700 
3 3 

933- = + — • 700 
3 3 



The IRE unit is introduced on 
page 326. 



For details, see page 516. 



The k B and k R scale factors apply directly to PAL: For 
a luma excursion of 700 mV, the PAL composite signal 
ranges about -233 mV to +933 mV. 

As I will detail in Setup (pedestal), on page 327, NTSC 
(except in Japan) has 7.5% setup. Setup reduces luma 
excursion by the fraction 37 Ao, and places luma on 
a pedestal of 7.5 IRE. The /c B and k R scale factors were 
computed disregarding setup, so the NTSC composite 
signal has an excursion -23 V3 IRE to +130% IRE, not 
quite the -33 V3 IRE to +133 V3 IRE excursion implied 
by the range -V3 ... +%. 



The +% limit applies to composite analog or digital 
studio equipment, and to PAL transmission. However, 
a hard limit of +1.2 (120 IRE) applies to terrestrial 
(VHF/UHF) NTSC transmitters, and a practical limit of 
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Figure 28.2 Spectra of U, V, 
and modulated chroma are 

sketched for 1.3 MHz base- 
band chroma bandwidth 
typical of the studio. Modu- 
lated chroma (C) consumes 
2.6 MHz of bandwidth, 
centered on the color 
subcarrier frequency. The 
crosshatching suggests that 
information from both the 
U and V components is 
found across the 2.6 MHz 
band. 



Not all RGB - legal colors are 
NTSC-legal: Some RGB-legal 
saturated yellow and cyan colors 
fall outside NTSC transmitter 
limits. An algorithm to reduce 
chroma saturation in R'G'B' 
images is described in Martindale, 
David, and Alan W. Paeth, “Televi- 
sion color encoding," in Graphics 
Gems II, edited by James Arvo 
(Boston: Academic Press, 1991). 



In the rare cases that vertical filtering 
is performed prior to NTSC or PAL 
encoding, this is called precombing. 






about +1.15 is appropriate to avoid interference prob- 
lems with sound in old television receivers. Many 
computer graphics systems have provisions to check or 
limit chroma excursion. If the composite signal is simply 
clipped, artifacts will result: Chroma clipping must be 
accomplished by more sophisticated techniques. 

NTSC chroma modulation 

Once U and 1/ components are formed, in studio video 
each component is lowpass-filtered to about 1.3 MHz, 
as sketched at the left-hand side of Figure 28.2 above. 
(Luma is processed through a matching delay.) Histori- 
cally, in analog encoders, only horizontal filtering was 
done. Vertical filtering ought to be performed also, to 
enable good performance from comb filter decoders. 
However, despite the potential for improved quality, 
vertical filtering is very rarely performed at encoders. 

After matrixing, scaling, and filtering, U and 1/ compo- 
nents are combined into a single modulated chroma 
signal C, using quadrature modulation onto 
a continuous-wave color subcarrier. Chroma modula- 
tion is achieved by simultaneously multiplying U by sine 
of the subcarrier and multiplying V by cosine of the 
subcarrier, then summing the products: 

C = L/sin cut + \/cos cut; U) = 2n/ sc Eq 28.7 

In Equation 28.7, sin cut represents the subcarrier, typi- 
cally about 3.58 MHz for NTSC or 4.43 MHz for PAL. 
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It is unfortunate that the formula- 
tion used in video is reversed 
from the convention established 
by Euler in 1758, and ubiquitous 
in mathematics, that 

e te = cos 0 + i sin 9, i 2 =- 1 

where i is associated with the 
y-axis. The video formulation also 
opposes convention in communi- 
cations theory, where the carrier 
is considered to be a cosine wave, 
not a sine wave. 




Chroma subcarrier 




Phase (encodes Hue) 




Amplitude 
(encodes Saturation) 

Figure 28.3 Quadrature 
modulation can be viewed 
as simultaneous phase and 
amplitude modulation. 



The exact subcarrier frequency is not important to 
quadrature modulation, but it is critical to frequency 
interleaving, as I will detail in NTSC and PAL frequency 
interleaving, on page 349. The spectrum of the modu- 
lated chroma is sketched at the right of Figure 28.2. 

The sum of the sine and cosine terms is then bandpass 
filtered, producing the modulated chroma spectrum 
sketched at the right of Figure 28.2. Modulated chroma 
is centered on the subcarrier. With U and 1/ bandwidth 
of 1.3 MHz, the lower sideband extends 1.3 MHz below 
the subcarrier frequency, and the upper sideband 
extends 1.3 MHz above. (In NTSC and PAL frequency 
interleaving, on page 349, I will discuss how much 
chroma bandwidth is preserved in transmission.) 

If you transform the two color differences U and 1/ from 
rectangular to polar coordinates, you can think of U and 
1/ as being conveyed by a combination of phase and 
amplitude modulation, as suggested by Figure 28.3 in 
the margin. Consider the point in the chroma plane 
plotted at coordinates [U, l/]. The angle from the x-axis 
to the point relates to the hue attribute of the associ- 
ated color; this quantity effectively modulates subcar- 
rier phase. The distance from the origin to the point 
[U, l/] relates to the saturation of the color; this quan- 
tity effectively modulates subcarrier amplitude. 

It is standard to sample digital composite video at four 
times the subcarrier frequency, 4/ sc . Early digital NTSC 
systems sampled on the [B'-Y', R'-Y'] axes - that is, 
sampling took place at the 0°, 90°, 180°, and 270° 
phases of subcarrier, so the sine subcarrier took values 
chosen cyclically from {0, 1, 0, -1}; the cosine subcar- 
rier took values chosen cyclically from {1, 0, -1, 0}. 

In early 4/ sc NTSC systems, multiplying sine by U and 
cosine by V, and adding the two products, gave digital 
modulated chroma samples {V, U, -V, -U}. However, 
when the SMPTE 244M standard was established for 
4/sc NTSC, it called for sampling on the [/, Q] axes - 
that is, at the 33° phase of subcarrier. In modern 4/ sc 
NTSC equipment, digital modulated chroma samples 
take values chosen cyclically from {/, Q, -I, -Q}. In the 



CHAPTER 28 



NTSC AND PAL CHROMA MODULATION 



339 



Y ' — 



1.3 MHz 



1.3 MHz 



SUBCARRIER 

OSCILLATOR 



QUADRATURE 

MODULATOR 

=01 



ADDER 



I ~ 



01 



CHROMA 

BANDPASS BURST 
FILTER INSERTER 




BURST 

INVERTER 



_ COMPOSITE 
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MODULATOR COMBINER 



Figure 28.4 NTSC encoder block diagram. A subcarrier oscillator generates sin and cos contin- 
uous waves. Quadrature modulation is performed on lowpass-filtered U and V components by 
a pair of 4-quadrant multipliers. An adder forms modulated chroma, C. Chroma and luma Y' are 
summed to form composite video. Frequency interleaving is achieved when subcarrier is coherent 
with scanning. 



following chapter, I will detail how luma is added to 
modulated chroma; adding luma yields the 4 f sc NTSC 
sample sequence {Y'+l, Y'+Q, Y'-l, Y'-Q}. 



If the phase of burst is altered, 
and the phase of modulated 
subcarrier is altered by the same 
amount, an analog chroma 
demodulator will still produce the 
correct color. What matters is the 
phase relationship between the 
two. An analog decoder cannot 
determine the phase at which 
chroma modulation was 
performed by an encoder. 



Chroma demodulation depends on the decoder having 
access to the continuous-wave color subcarrier used in 
encoding. To this end, an encoder inserts a brief burst 
of the inverted sine subcarrier into the horizontal 
blanking interval. (See Figure 42.1, on page 512.) 

Figure 28.4 above shows the block diagram of an NTSC 
encoder. An S-video interface provides V" and C signals 
after modulation but prior to summation. In S-video, 
burst is inserted onto the modulated chroma signal. 

NTSC chroma demodulation 

An NTSC decoder is shown in Figure 28.5 at the top of 
the facing page. Decoding begins with separation of the 
luma and modulated chroma components - 
Y'/C separation, which I will detail in the following 
chapter, NTSC and PAL frequency interleaving. For an 
S-video input signal, the Y'/C separator is bypassed. 
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COMPOSITE 

NTSC 

VIDEO 




Figure 28.5 NTSC decoder block diagram. Y'/C separation is accomplished using a "notch" filter 
ora comb filter. Subcarrier is regenerated from burst. Separated chroma is independently multi- 
plied by continuous sin and cos waves in quadrature; the products are lowpass filtered to recover U 
and V. To recover wideband U and V in the studio, use 1.3 MHz filters. To recover narrowband U 
and V in consumer applications, use 600 kHz filters. For VHS VCRs, use 300 kHz filters. 



The decoder reconstructs the continuous-wave color 
subcarrier that was used at encoding. I will describe 
that process in Subcarrier regeneration, on page 344. 



In NTSC Y'lQ system, on page 365, 

I will explain why it is futile to 
attempt to recover more than about 
600 kHz of chroma bandwidth from 
terrestrial VHF/UHF NTSC. 



The separated modulated chroma is then multiplied 
simultaneously by sine and cosine of the regenerated 
subcarrier. The products are lowpass filtered to recover 
the baseband U and 1/ components. Luma is processed 
through a matching delay. 



Even in NTSC, modulated chroma 
inverts phase on alternate lines! 
The A in PAL refers not to this 
alternation, but to the line by line 
alternation of phase of the 1/ 
chroma component. 



Provided that U and 1/ are limited in bandwidth to less 
than half the subcarrier frequency, chroma modulation 
is reversible without information loss. In practice, 
chroma modulation itself introduces no significant 
impairments, although the bandwidth limitation of the 
color difference signals removes color detail. 

PAL chroma modulation 

Analog transmission is susceptible to differential phase 
error, whereby the phase of modulated chroma is influ- 
enced by luma (as I will describe on page 541). In 
NTSC, these errors cause incorrectly decoded hue; 
vision is quite sensitive to hue errors. PAL augments the 
NTSC system with a V-axis inverter, which alternates the 
phase of the modulated 1/ component line-by-line. (PAL 
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Figure 28.6 PAL encoder block diagram augments the NTSC encoder of Figure 28.4 with 
a \/-axis inverter. Also, the alternating cosine phase of subcarrier makes a contribution to burst. 



derives its acronym from this Phase Alternation at Line 
rate.) The l/-axis alternation causes any phase error in 
modulated chroma to take alternate directions on alter- 
nate lines. These phase errors average at the receiver, 
and thereby cancel. Chroma phase error causes loss of 
saturation, but that is less objectionable than hue shift. 
PAL is inherently insensitive to DP-induced hue error. 
Figure 28.6 shows the block diagram of a PAL encoder. 



Old texts refer to a PAL-S (simple 
PAL) decoder, which operates 
without a delay element, and 
a PAL-D (deluxe PAL) decoder, 
which uses a 1 H line delay. Nowa- 
days, virtually all PAL receivers are 
deluxe PAL. This use of S and D is 
unrelated to the letters B, D, G, H, 
I, M, or N that may follow PAL; 
those refer to transmission stan- 
dards. PAL-D, in the second sense, 
is used for broadcasting in China. 



A PAL decoder - sketched in Figure 28.7 opposite - is 
essentially an NTSC decoder augmented by a l/-axis 
inverter and a U/V separator. The U/V separator 
produces modulated chroma components that I denote 
Cy and C v , based upon PAL's \/-axis alternation, as I will 
detail in the following chapter, NTSC and PAL frequency 
interleaving. In modern PAL decoders, U/V separation 
uses a comb filter with at least a one line (1 H) delay. 
Unlike an NTSC comb filter, PAL's comb filter separates 
U and V] it does not separate luma from chroma! ( U/V 
separation is intrinsic in NTSC's quadrature demodu- 
lator; no circuitry is dedicated to that function.) 



In 4/ sc digital PAL, it is standard to sample on the 
[U+V, U-V ] axes, at 45° with respect to subcarrier (i.e., 
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COMPOSITE 

PAL 

VIDEO 




Figure 28.7 PAL decoder block diagram begins with Y'/C separation, usually accomplished by 
a "notch" filter. Modulated U and V chroma components are separated, usually by a comb filter. 
Regenerated subcarrier is processed through PAL's l/-axis inverter prior to demodulating chroma. 



0° with respect to PAL burst). This results in modulated 
chroma samples chosen cyclically from these values: 

[U + V, -U±V, -U + V, U + V} Eq 28.8 

The ± symbol designates addition on lines where 1/ is 
normal ("NTSC lines") and subtraction on lines where 1/ 
is inverted ("PAL lines"). The + symbol designates 
subtraction on lines where V is normal and addition on 
lines where V is inverted. 

The (/-axis switch causes a vectorscope display of the 
color test pattern to display two hexagons, as sketched 
in Figure 28.8 overleaf. Associated with PAL's l/-axis 
switch, burst alternates at line rate, between +135° and 
-135° with respect to subcarrier (compared to a fixed 
180° burst phase for NTSC). This is called Bruch burst or 
swinging burst. Burst alternation enables a PAL decoder 
to recover the \/-switch polarity. 

PAL differs from NTSC in several other aspects to be 
detailed in Frequency interleaving in PAL, on page 355. 

The advantage of PAL's l/-axis inverter is offset by 
increased complexity both at the encoder and the 
decoder. While PAL's modifications to NTSC once 
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Figure 28.8 UV compo- 
nents in PAL display two 
overlaid hexagons on 
a vectorscope. One portion 
is identical to the NTSC 
pattern in Figure 28.1, on 
page 336; the associ- 
ated lines (where PAL 
burst takes +135° phase) 
are sometimes called 
NTSC lines. The other 
lines, where the 
(/-component is inverted 
(and PAL burst takes 
-135° phase) are some- 
times called PAL lines. 




Subcarrier regeneration in the 
digital domain requires a digital 
phase-locked loop (DPLL), one 
component of which is a number- 
controlled oscillator (NCO) or 
a direct digital synthesizer (DDS). 



conferred an incremental performance advantage, that 
advantage has long since been subverted by the great 
difficulty of exchanging program material between 
countries, and by the expense of producing profes- 
sional, industrial, and consumer equipment that oper- 
ates with both standards. PAL's differences must now be 
seen as premature optimization. I hope that European 
television engineers will think twice before again 
adopting a uniquely European standard. 

Subcarrier regeneration 

An analog NTSC or PAL decoder regenerates subcarrier 
using a specialized phase-locked loop (PLL) circuit like 
that shown in Figure 28.9 at the top of the facing page. 
Continuous-wave cosine and sine wave signals are 
generated in a crystal oscillator; their frequency and 
phase are updated once per line by a phase compar- 
ator, based on a comparison with the signal's burst. The 
loop filter is a lowpass filter with a time constant of 
about 10 lines. 



The most straightforward way to generate burst would 
be to sample the sine phase of subcarrier directly. 
However, the NTSC worried that if a receiver had poor 
blanking, burst might become visible. The potential for 
burst visibility was minimized by using the inverted sine 
subcarrier: A burst inverter is included in the NTSC 
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Figure 28.9 Subcarrier regeneration 



Techniques to regenerate subcar- 
rier from consumer devices are 
described in Consumer analog 
NTSC and PAL, on page 579. 



encoder of Figure 28.4, and in the PAL encoder of 
Figure 28.6. In Figure 28.9, I indicate regeneration of 
subcarrier itself (at 0°), but keep in mind that the sine 
phase of subcarrier is inverted from burst phase. SMPTE 
170 M uses the term "burst-locked sinewave"; this 
reflects the preference of some decoder designers to 
regenerate a continuous wave at burst phase (180°). 

The circuit of Figure 28.9 can be directly used to recon- 
struct PAL color subcarrier: The two burst phases 
average in the loop filter to 180°. The (/-switch polarity 
is derived from the phase error, or from demodulated V, 
during burst. Alternatively, a more sophisticated circuit 
can process the +135° and -135° burst phases explicitly. 



Although 4/sc NTSC systems 
sample on the [/, Q] axes, narrow- 
band Q is rarely used. See NTSC 
Y'lQ system, on page 365. 



The obvious choices of sampling phase in a composite 
digital 4/ sc system are 0° 90° 180° and 270° so that 
subcarrier samples take the values {0, 1, 0, -1}. But no! 
It is standard to sample composite 4/ sc NTSC on the 
[/, Q] axes, at 33° 123° 213° and 303° with respect to 
the sine (0°) subcarrier; it is standard to sample 
composite 4/ sc PAL systems on the [U+V, U-V] axes, at 
45° 135° 225° and 315° with respect to the sine 
subcarrier. 



Burst is present on virtually all composite video signals 
nowadays, even black and white signals. However, if 
the subcarrier regenerator detects that burst is absent, 
then a color killer should force demodulated U and 1/ to 
zero so as to reproduce a grayscale ("black-and-white") 
picture. 
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S-video interface 



The S-video connector is sketched 
on page 409. Electrical and 
mechanical details are presented 
in S-video-525 (YVC3.58), on 
page 515, and S-video-625 
(Y VC 4. 43), on page 531. 



The S-video interface - sometimes denoted Y'/C, 

Y'/C 3.58, or Y/C4.43 - is a hybrid of component and 
composite systems. The S stands for separate: In 
S-video, luma and modulated chroma signals are 
conveyed across the interface on separate wires. There 
are three different versions of the S-video interface: 
S-video-525, S-video-525-J (for NTSC-J, in Japan), and 
S-video-625. The S-video interface was first introduced 
in S-VHS VCRs, and S-video remains a feature of virtu- 
ally all S-VHS and Hi8 VCRs. However, the S-video 
interface is quite independent of S-VHS recording tech- 
nology. 



Quadrature modulation is intrinsic to S-video, so 
chroma suffers some bandwidth reduction. However, 
S-video does not use frequency interleaving: Luma and 
chroma are not summed, so cross-luma and cross-color 
artifacts are completely avoided. Because S-video 
avoids the NTSC or PAL "footprint," it offers substan- 
tially better performance than composite NTSC or PAL. 
S-video is widely used in desktop video and consumer 
video. It is rarely used in the studio. 

Decoder controls 

A few decades ago, consumer receivers had unstable 
circuitry: User adjustment was necessary to produce 
acceptable color. Modern circuits are so stable that user 
controls are no longer necessary, but consumers 
continue to expect them. This is a shame, because 
consumer adjustment of these controls is more likely to 
degrade the picture than to improve it. Developers of 
desktop video systems typically provide an excess of 
controls; perhaps they believe that this relieves them of 
implementing correct signal-processing arithmetic. 



I described the brightness and contrast controls in 
Chapter 3, on page 25. These controls operate in the 
R'G'B’ domain; they are found in television receivers 
and in computer monitors. Two other controls, which 
I will now describe, are associated with NTSC and PAL 
decoders. 
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Figure 28.10 NTSC decoder controls. Saturation alters chroma gain. Hue rotates the [U, \/]-plane 
as viewed on a vectorscope. These controls were necessitated by the instability of analog receivers. 
Modern circuits are sufficiently stable that user adjustment of these controls is no longer necessary. 



Figure 28.11 Saturation 
control in a decoder alters 
chroma gain. The compa- 
rable control in processing 
equipment is chroma gain. 



Figure 28.10 shows the block diagram of an NTSC or 
PAL decoder, augmented to reflect the places in the 
signal flow where the these controls are effected. 

The saturation control, whose icon is shown in 
Figure 28.1 1 , alters the gain of modulated chroma prior 
to demodulation. (Alternatively, the control can be 
implemented by altering the gain of both decoded 
color difference components.) This control is some- 
times called color, but saturation is preferred because 
it is ambiguous whether color should adjust which 
color (hue) or the amount of color (saturation). This 
control is found in both NTSC and PAL decoders. 



i^?Vi 



Figure 28.12 Hue control 
is usually implemented in a 
decoder by altering phase 
of regenerated subcarrier, 
so as to rotate the demod- 
ulated [ U , V] components. 
The comparable control in 
processing equipment is 

CHROMA PHASE. 



The hue control, whose icon is shown in Figure 28.12, 
rotates the decoded colors around the [U, l/] plot of 
Figure 28.1 ; it is implemented by altering the phase of 
regenerated subcarrier. This control is sometimes called 
tint. That name is misleading and should be avoided, 
because to an artist, tint refers to adding white to 
increase lightness and decrease saturation - to an artist, 
to tint a color preserves its hue! As I mentioned on 
page 342, PAL has inherent immunity to hue errors 
arising from differential phase errors; consequently, the 
hue control is usually omitted from PAL decoders. 



CHAPTER 28 



NTSC AND PAL CHROMA MODULATION 



347 




NTSC and PAL 

frequency interleaving 29 



I introduced the concepts of composite encoding in 
Introduction to composite NTSC and PAL, on page 103. 

In the previous chapter, NTSC and PAL chroma modula- 
tion, I detailed the formation of modulated chroma. In 
the S-video interface used in consumer equipment, 
luma and modulated chroma are conveyed separately. 
However, in most applications of NTSC and PAL, luma 
and modulated chroma are summed to form a single 
composite NTSC or PAL signal. In studio-quality or 
broadcast composite video, summation of these signals 
is based upon the frequency-interleaving principle. 

The frequency-interleaving scheme devised by the NTSC 
places chroma at frequencies that are little-used by 
luma: The NTSC scheme enabled transmission of 

1.3 MHz of chroma signal interleaved with - or if you 
like, overlaid upon - a 4.2 MHz luma signal. In NTSC, 
frequency interleaving is exploited by a comb filter. 

I will describe frequency interleaving in a moment, but 
first, I will discuss some aspects of notch filtering. 

Notch filtering 

When U and V components are filtered to 1.3 MHz and 
then modulated, chroma is centered on the color 
subcarrier frequency, as I sketched in Figure 28.2, on 
page 338. In NTSC, color subcarrier frequency is about 
3.6 MHz, so modulated chroma extends from about 

2.3 MHz to 4.9 MHz. Conventional PAL's color subcar- 
rier frequency is about 4.4 MHz, so modulated chroma 
extends from about 3 .1 MHz to 5.7 MHz. 
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Figure 29.1 Y'/C spectra are 
disjoint if luma is limited to 
2 MHz. 



If luma were bandwidth limited to 2 MHz prior to 
summing with chroma, frequency occupancy of the two 
signals would be disjoint, as sketched in Figure 29.1 . 

A cheap decoder could then accomplish separation 
using the simple scheme sketched in Figure 29.2: 

A lowpass filter extracts luma, and a bandpass filter 
centered on the color subcarrier frequency extracts 
chroma. This scheme is called notch filtering; it is used 
in VHS VCRs. This scheme has the obvious disadvan- 
tage of offering at most 2.3 MHz of luma bandwidth in 
NTSC (or 3.1 MHz in conventional PAL). The pictures 
suffer poor sharpness compared to the frequency inter- 
leaving scheme that I will describe, whereby luma and 
chroma share frequencies between 2.3 MHz (or 
3.1 MHz) and the top of the luma band. 
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Figure 29.2 Y'/C separation 
using notch filtering involves 
a lowpass filter to extract luma 
and a bandpass filter to extract 
modulated chroma. The term 
notch is a misnomer, as 
explained in the text. 



I described the simple-minded scheme, and sketched 
Figure 28.2, as if 1.3 MHz of baseband chroma could be 
decoded. In the studio, composite bandwidth of 
5.5 MHz or more is available, so modulated chroma 
bandwidth of about 1.3 MHz is attainable. In conven- 
tional PAL, the color subcarrier is about 4.4 MHz; with 
1.3 MHz chroma, the upper sideband of modulated 
chroma would extend to about 5.7 MHz. However, 
PAL-B/G/H transmission is limited to 5.0 MHz of video 
bandwidth, so just 600 kHz of chroma bandwidth is 
recoverable. (The PAL-I system, used in the U.K., has 
5.5 MHz of composite video bandwidth; in PAL-I, 
about 1.1 MHz of chroma is recoverable.) 



In the NTSC studio, about 1.3 MHz of modulated 
chroma bandwidth is available, but for NTSC transmis- 
sion the situation is more complicated. NTSC chroma is 
modulated onto a 3.58 MHz subcarrier, and an NTSC 
transmitter is limited to 4.2 MHz: Only about 600 kHz 
is available for the upper sideband of modulated 
chroma. The designers of NTSC devised a scheme 
whereby chroma bandwidth of 1.3 MHz could be 
achieved for one of the two color difference compo- 
nents. Sadly, the scheme fell into disuse. I will detail the 
scheme, and its demise, in the following chapter, NTSC 
YTQ system. Today, chroma bandwidth in the studio is 
typically 1.3 MHz, but chroma bandwidth of terrestrial 
NTSC broadcast is effectively limited to 600 kHz. 
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Figure 29.3 NTSC chroma is 
limited to 600 kHz upon 
terrestrial VHF/UHF broad- 
cast, even if wideband 
components are presented to 
the encoder. (The original 
NTSC Y'lQ scheme allowed 
1.3 MHz bandwidth for one 
chroma component, but that 
scheme is now abandoned.) In 
the studio, chroma band- 
width of 1.3 MHz is available. 



A notch filter is sometimes 
called a trap. 




Today's NTSC situation is sketched in Figure 29.3: U 
and V signals limited to 600 kHz are shown at the left. 
Luma, and the composite signal, are shown at the right. 
So, a decoder for terrestrial VHF/UHF NTSC should 
bandlimit demodulated chroma to about 600 kHz. You 
might think that all other components of the composite 
signal should be presumed to be luma, and the word 
notch suggests that a complimentary bandstop filter 
should be used to recover luma. There are two prob- 
lems with that approach. One problem relates to broad- 
cast NTSC signals, the other to VHS recording. 

• Broadcast NTSC video is bandlimited to about 4.2 MHz. 
A notch filter that rejects components within 600 kHz 
of the 3.58 MHz subcarrier will reject everything up to 
the top of the luma band! There is no reason to reject 
only frequencies centered on subcarrier - all high 
frequencies might as well be rejected. A lowpass filter 
ought to be used for luma, instead of a notch filter. 

• In VHS recording, luma is limited to a bandwidth of 
about 2 MHz, and baseband chroma is limited to about 
300 kHz. The situation of Figure 29.1 pertains, and 
there is no need for a luma notch filter! 

So, the term notch filter is a misnomer. In both broad- 
cast and consumer VHS video, unless comb filtering is 
used, a decoder should use a lowpass filter to extract 
luma and a bandpass filter to extract chroma. However, 
much better separation is obtained by using a comb 
filter to exploit frequency interleaving. 
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Figure 29.4 Color subcarrier in NTSC has its frequency chosen so that subcarrier phase inverts 
line-to-line. If U and V color difference components are constant line-to-line, then the phase of 
modulated chroma will invert line-to-line; this enables a comb filter to separate luma and chroma, 
A field comprises an odd integer number of halflines; it contains an odd integer number of half- 
cycles of subcarrier. Thus, subcarrier inverts phase frame-to-frame. This leads to NTSC color- 
frames A and B. 




SUBTRACT: Luma cancels, 

Chroma averages 



Frequency interleaving in NTSC 

In studio video, the frequency of unmodulated NTSC 
color subcarrier is an odd multiple of half the line rate. 
This relationship causes subcarrier to invert phase line 
to line with respect to sync, as sketched in Figure 29.4 
above. The luma and modulated chroma components 
sketched at the bottom right of the composite NTSC 
spectrum in Figure 29.3 appear, at the macro scale, to 
overlay each other. Flowever, the line-to-line inversion 
of subcarrier phase produces frequency interleaving: 
Viewed at a finer scale, the spectra interleave. 

Frequency interleaving enables separation of luma and 
chroma using a comb filter. A simple ("2-line," or 1 H) 
comb filter has storage for one line of composite video. 
Provided that luma and chroma are similar line-to-line, 
if vertically adjacent samples are summed, chroma 
tends to cancel and luma tends to average. If vertically 
adjacent samples are differenced, luma tends to cancel 
and chroma tends to average. This interpretation is 
sketched in Figure 29.5 in the margin. Figure 29.6 at 
the top of the facing page sketches the implementation 
of a 2-line comb filter in NTSC. A digital comb filter 
uses digital memory; the memory element of an analog 
comb filter is typically an ultrasonic glass delay line. 



Figure 29.5 Y7C separation in 
a 2-line (1 H) NTSC comb filter 
is based upon line-by-line 
inversion of subcarrier phase. 



In 4 f sc NTSC, if the composite signal comprises the 
sample sequence {/'+/, Y'+Q, Y'-l, Y'-Q} on one line, 
NTSC's subcarrier frequency interleaving causes it to 
take the values {Y'-l, Y'-Q, Y'+l, Y'+Q} on the next. The 
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Figure 29.6 NTSC 2-line comb filter separates Y' and C using a 1H delay element. This is more 
complex than a notch filter; however, cross-luma and cross-color artifacts are greatly reduced. 



In Consumer analog NTSC and 
PAL, on page 579, I will discuss 
the decoding of video from 
consumer devices. 



sum of two vertically adjacent samples approximates 
luma; their difference approximates modulated chroma. 

In a decoder that uses a notch filter, NTSC chroma 
inverts in phase line-by-line even if mistakenly decoded 
as luma. When integrated vertically by large spot size at 
the display, and when integrated vertically in the 
viewer's eye, luma tends to average (that is, to be rein- 
forced) and chroma tends to cancel. This visual filtering 
is what allowed color to be retrofitted into black-and- 
white television: The newly added modulated chroma 
component averaged out - and thereby was not very 
visible - when viewed on a black-and-white receiver. 

Proper operation of a comb filter depends upon the 
video signal having stable timebase and coherent 
subcarrier. In addition, comb filtering is only sensible if 
the video signal has luma content above 3 MHz. All of 
these conditions hold for broadcast signals, and 
luma/chroma separation should use comb filtering. If 
any of these conditions fails, as is likely with video orig- 
inating from a consumer device, then comb filtering is 
likely to introduce artifacts and it should be defeated; 
a notch filter should be used instead. 



In later sections of this chapter, I will explain comb 
filtering in the spatial frequency (two-dimensional) 
domain and in the one-dimensional frequency domain. 

Cross-luma and cross-color 

If neither luma nor chroma changes vertically, vertical 
summing and differencing in a comb filter can perfectly 
separate luma and chroma! In a solid-colored area of an 
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Figure 29.7 Dot crawl is exhib- 
ited when saturated colors side- 
by-side are subject to NTSC 
decoding. Dot crawl is most 
evident on the green-magenta 
transition of the colorbar signal. 



Figure 29.8 Hanging dots are 
evident when the colorbar 
signal has been subject to NTSC 
decoding by a simple two-line 
nonadaptive comb filter. 



image, comb filtering works well. However, to the 
extent that either luma or chroma changes vertically, 
chroma leaks ("crosses") into luma, and luma leaks 
("crosses") into chroma. A decoder with a notch filter 
will produce cross-color artifacts that may appear as 
swirling rainbows, when luma occupying frequencies in 
the range of subcarrier "crosses into" - and is mistak- 
enly decoded as - chroma. A notch filter can also intro- 
duce cross-luma artifacts, when chroma "crosses into" - 
and is mistakenly decoded as - luma. 

Consider a set of image rows, each containing the same 
large, abrupt change in color - say from one colored 
vertical bar to another in the colorbar test pattern to be 
explained in Colorbars, on page 535. Most decoders will 
mistakenly interpret some of the power in this transi- 
tion to be luma. The mistakenly decoded luma will not 
only invert in phase line to line, it will also invert frame 
to frame. The frame-rate inversion, combined with 
interlace, produces a fine pattern of dots, depicted in 
Figure 29.7. The dots apparently travel upward along 
the transition at a rate of one image row per field time. 
(In 480/, each dot takes about eight seconds to traverse 
the height of the image.) This particular cross-luma arti- 
fact is called dot crawl. It can be avoided in an NTSC 
decoder by the use of a comb filter. 

Another cross-luma artifact is apparent when the 
SMPTE colorbar test pattern is decoded using a simple 
2-line (1/7) comb filter. About 2 /j of the way down the 
pattern, the image contains highly saturated comple- 
mentary colors that abut vertically: There is an abrupt 
change from a line containing a set of saturated colors 
to a line containing the same colors in a different order. 
When decoded by a simple 2-line NTSC comb filter, 
each abrupt vertical transition contains power that 
decodes to a strong luma component at the subcarrier 
frequency. On a monitor with sufficiently high resolu- 
tion, stationary, horizontal patterns of hanging dots, 
depicted schematically in Figure 29.8, are displayed at 
several of the transitions. The artifact is strikingly 
obvious when colorbars are displayed on a studio 
monitor equipped with a comb filter. 
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The introduction of cross-luma and cross-color artifacts 
at an NTSC decoder can be minimized by using a comb 
filter. However, encoders rarely perform any processing 
(beyond bandpass filtering of modulated chroma) on 
luma and chroma prior to their being summed. If luma 
and chroma components overlap in spatial frequency, 
they will be confused upon summation at the encoder, 
and no subsequent processing in a decoder can possibly 
repair the damage: The composite footprint is said to be 
imposed on the signal by the first encoder. 

Highly sophisticated encoders include comb filter 
preprocessing to prevent the introduction, at the 
encoder, of severe cross-color and cross-luma artifacts. 
However, broadcasters have never deployed these 
encoders in any significant numbers. This has placed an 
upper bound on the quality of video delivered to those 
consumers having high-quality receivers. 

Frequency interleaving in PAL 

In PAL chroma modulation, on page 341, I described the 
l/-axis switch, which introduces line-by-line phase 
inversion of the 1/ chroma component. PAL differs from 
NTSC in two other significant ways that I will describe 
in this section; other minor differences will be described 
in 576i PAL composite video, on page 529. 



1135 In studio NTSC, subcarrier frequency is an odd multiple 

/sc.pal- b/g/h/i 4 /h,576/ of one-half the line rate. In PAL, subcarrier frequency is 

based on an odd multiple of one-quarter the line rate. 
On its own, this would lead to roughly a 90° delay of 
subcarrier phase line-by-line; see Figure 29.9 overleaf. 



In PAL-M, the offset is absent. 



See 576/ PAL color subcarrier, on 
page 375. PAL was developed in 
Hannover [s/c], Germany, at the 
research labs of Telefunken. In 
most video literature, the name of 
the city is Anglicized and spelled 
with a single n. 



In standard 5 76/ (625/50) PAL, a +25 Hz frequency 
offset is added to the basic subcarrier frequency to 
reduce the visibility of a cross-luma artifact called 
Hannover bars. The +25 Hz offset contributes a +0.576° 
phase advance (0.0016 of a subcarrier cycle) to subcar- 
rier phase line-by-line. This phase advance leads to the 
nonlinelocked characteristic, and the noninteger 
number of samples per total line, of 576/, 4/ sc PAL. 
Since the offset adds exactly one subcarrier cycle over 
the duration of a frame, it has no impact on the four- 
frame sequence. Historically, some 576/ PAL test signal 
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Figure 29.9 Color subcarrier in 576/ PAL has a line-to-line phase delay of about V 4 -cycle - that is, 
90°. A frame comprises an odd number of lines, so it contains an odd integer number of quarter- 
cycles of subcarrier: Subcarrier is delayed 90° frame-to-frame. This leads to PAL colorframes I, II, 

III, and IV. A +25 Hz offset is added to this basic frequency; the offset alters the line-to-line phase 
delay very slightly, to 90.576°, but does not alter the frame-to-frame sequence. 



generators were simplified by omitting the +25 Hz 
offset. A PAL signal without the +25 Hz offset is called 
nonmathematical. (PAL signals where scanning and color 
subcarrier are incoherent, such as the signals to be 
described in Degenerate analog NTSC and PAL, on 
page 581, can also be considered nonmathematical.) 



Figure 29.10 Chroma arrange- 
ment in 4/ sc PAL. In "mathe- 
matical" 576/ PAL, this diagram 
applies at just one location in 
a field; however, successive lines 
have a +0.576° phase offset. In 
nonmathematical PAL, this 
pattern tiles each field. 



In 5 76/ PAL sampled at 4/ sc , the near-90° line-to-line 
delay of subcarrier phase, combined with the inversion 
of the modulated 1/ component, cause a 4x4 block of 
chroma samples in a field to take values such as those 
of Figure 29.10: 
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An ideal U/V separator for PAL would incorporate 
a delay element having a duration of exactly one line 
time: This would make available vertically aligned data. 
(In 5 76/ PAL, one line time is almost exactly 283 3 /4 
cycles of subcarrier.) However, with exactly one line of 
delay, the modulated chroma samples are in the wrong 
vertical arrangement for easy separation: Summing two 
vertically adjacent samples yields neither U nor l/. In 
addition to a line delay, a U/V separator ideally needs 
a 90° phase shift element to advance the phase of 
modulated chroma by 90°. Vertical summation would 
then extract the 1/ component; differencing would 
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